100+ datasets found
  1. h

    store-sales-time-series-forecasting

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiana, store-sales-time-series-forecasting [Dataset]. https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Tiana
    Description

    taken from this Kaggle competition:

      Dataset Description
    

    In this competition, you will predict sales for the thousands of product families sold at Favorita stores located in Ecuador. The training data includes dates, store and product information, whether that item was being promoted, as well as the sales numbers. Additional files include supplementary information that may be useful in building your models.

      File Descriptions and Data Field Information
    

    train.csv… See the full description on the dataset page: https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting.

  2. Hourly Sensor Data for Time Series Forecasting

    • kaggle.com
    Updated Jul 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SudhanvaHG (2024). Hourly Sensor Data for Time Series Forecasting [Dataset]. https://www.kaggle.com/datasets/sudhanvahg/hourly-sensor-data-for-forecasting
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    SudhanvaHG
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains hourly sensor data collected over a period of time. The primary objective is to forecast future sensor values using various time series forecasting methods, such as SARIMA, Prophet, and machine learning models. The dataset includes an ID column, a Datetime column and a Count column, where the Count represents the sensor reading at each timestamp.

  3. h

    Timeseries-PILE

    • huggingface.co
    Updated May 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Auton Lab (2024). Timeseries-PILE [Dataset]. https://huggingface.co/datasets/AutonLab/Timeseries-PILE
    Explore at:
    Dataset updated
    May 11, 2024
    Dataset authored and provided by
    Auton Lab
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Time Series PILE

    The Time-series Pile is a large collection of publicly available data from diverse domains, ranging from healthcare to engineering and finance. It comprises of over 5 public time-series databases, from several diverse domains for time series foundation model pre-training and evaluation.

      Time Series PILE Description
    

    We compiled a large collection of publicly available datasets from diverse domains into the Time Series Pile. It has 13 unique domains of data… See the full description on the dataset page: https://huggingface.co/datasets/AutonLab/Timeseries-PILE.

  4. Weather Long-term Time Series Forecasting

    • kaggle.com
    Updated Nov 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair King (2024). Weather Long-term Time Series Forecasting [Dataset]. https://www.kaggle.com/datasets/alistairking/weather-long-term-time-series-forecasting
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 3, 2024
    Dataset provided by
    Kaggle
    Authors
    Alistair King
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Weather Long-term Time Series Forecasting (2020)

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F8734253%2F832430253683be01796f74de8f532b34%2Fweather%20forecasting.png?generation=1730602999355141&alt=media" alt="">

    Dataset Description

    Weather is recorded every 10 minutes throughout the entire year of 2020, comprising 20 meteorological indicators measured at a Max Planck Institute weather station. The dataset provides comprehensive atmospheric measurements including air temperature, humidity, wind patterns, radiation, and precipitation. With over 52,560 data points per variable (365 days × 24 hours × 6 measurements per hour), this high-frequency sampling offers detailed insights into weather patterns and atmospheric conditions. The measurements include both basic weather parameters and derived quantities such as vapor pressure deficit and potential temperature, making it suitable for both meteorological research and practical applications. You can find some initial analysis using this dataset here: "Weather Long-term Time Series Forecasting Analysis".

    File Structure

    The dataset is provided in a CSV format with the following columns:

    Column NameDescription
    dateDate and time of the observation.
    pAtmospheric pressure in millibars (mbar).
    TAir temperature in degrees Celsius (°C).
    TpotPotential temperature in Kelvin (K), representing the temperature an air parcel would have if moved to a standard pressure level.
    TdewDew point temperature in degrees Celsius (°C), indicating the temperature at which air becomes saturated with moisture.
    rhRelative humidity as a percentage (%), showing the amount of moisture in the air relative to the maximum it can hold at that temperature.
    VPmaxMaximum vapor pressure in millibars (mbar), representing the maximum pressure exerted by water vapor at the given temperature.
    VPactActual vapor pressure in millibars (mbar), indicating the current water vapor pressure in the air.
    VPdefVapor pressure deficit in millibars (mbar), measuring the difference between maximum and actual vapor pressure, used to gauge drying potential.
    shSpecific humidity in grams per kilogram (g/kg), showing the mass of water vapor per kilogram of air.
    H2OCConcentration of water vapor in millimoles per mole (mmol/mol) of dry air.
    rhoAir density in grams per cubic meter (g/m³), reflecting the mass of air per unit volume.
    wvWind speed in meters per second (m/s), measuring the horizontal motion of air.
    max. wvMaximum wind speed in meters per second (m/s), indicating the highest recorded wind speed over the period.
    wdWind direction in degrees (°), representing the direction from which the wind is blowing.
    rainTotal rainfall in millimeters (mm), showing the amount of precipitation over the observation period.
    rainingDuration of rainfall in seconds (s), recording the time for which rain occurred during the observation period.
    SWDRShort-wave downward radiation in watts per square meter (W/m²), measuring incoming solar radiation.
    PARPhotosynthetically active radiation in micromoles per square meter per second (µmol/m²/s), indicating the amount of light available for photosynthesis.
    max. PARMaximum photosynthetically active radiation recorded in the observation period in µmol/m²/s.
    TlogTemperature logged in degrees Celsius (°C), potentially from a secondary sensor or logger.
    OTLikely refers to an "operational timestamp" or an offset in time, but may need clarification depending on the dataset's context.

    Potential Use Cases

    This high-resolution meteorological dataset enables applications across multiple domains. For weather forecasting, the frequent measurements support development of prediction models, while climate researchers can study microclimate variations and seasonal patterns. In agriculture, temperature and vapor pressure deficit data aids crop modeling and irrigation planning. The wind and radiation measurements benefit renewable energy planning, while the comprehensive atmospheric data supports environmental monitoring. The dataset's detailed nature makes it particularly suitable for machine learning applications and educational purposes in meteorology and data science.

    Credits

    • This data was provided by the Max Planck Institute, and acc...
  5. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, csv
    Updated Feb 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Pavel Šiška; Pavel Šiška (2025). CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting [Dataset]. http://doi.org/10.5281/zenodo.13382427
    Explore at:
    csv, application/gzipAvailable download formats
    Dataset updated
    Feb 26, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Josef Koumar; Josef Koumar; Karel Hynek; Karel Hynek; Tomáš Čejka; Tomáš Čejka; Pavel Šiška; Pavel Šiška
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    CESNET-TimeSeries24: The dataset for network traffic forecasting and anomaly detection

    The dataset called CESNET-TimeSeries24 was collected by long-term monitoring of selected statistical metrics for 40 weeks for each IP address on the ISP network CESNET3 (Czech Education and Science Network). The dataset encompasses network traffic from more than 275,000 active IP addresses, assigned to a wide variety of devices, including office computers, NATs, servers, WiFi routers, honeypots, and video-game consoles found in dormitories. Moreover, the dataset is also rich in network anomaly types since it contains all types of anomalies, ensuring a comprehensive evaluation of anomaly detection methods.

    Last but not least, the CESNET-TimeSeries24 dataset provides traffic time series on institutional and IP subnet levels to cover all possible anomaly detection or forecasting scopes. Overall, the time series dataset was created from the 66 billion IP flows that contain 4 trillion packets that carry approximately 3.7 petabytes of data. The CESNET-TimeSeries24 dataset is a complex real-world dataset that will finally bring insights into the evaluation of forecasting models in real-world environments.

    Please cite the usage of our dataset as:

    Koumar, J., Hynek, K., Čejka, T. et al. CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting. Sci Data 12, 338 (2025). https://doi.org/10.1038/s41597-025-04603-x

    @Article{cesnettimeseries24,
    author={Koumar, Josef and Hynek, Karel and {\v{C}}ejka, Tom{\'a}{\v{s}} and {\v{S}}i{\v{s}}ka, Pavel},
    title={CESNET-TimeSeries24: Time Series Dataset for Network Traffic Anomaly Detection and Forecasting},
    journal={Scientific Data},
    year={2025},
    month={Feb},
    day={26},
    volume={12},
    number={1},
    pages={338},
    issn={2052-4463},
    doi={10.1038/s41597-025-04603-x},
    url={https://doi.org/10.1038/s41597-025-04603-x}
    }

    Time series

    We create evenly spaced time series for each IP address by aggregating IP flow records into time series datapoints. The created datapoints represent the behavior of IP addresses within a defined time window of 10 minutes. The vector of time-series metrics v_{ip, i} describes the IP address ip in the i-th time window. Thus, IP flows for vector v_{ip, i} are captured in time windows starting at t_i and ending at t_{i+1}. The time series are built from these datapoints.

    Datapoints created by the aggregation of IP flows contain the following time-series metrics:

    • Simple volumetric metrics: the number of IP flows, the number of packets, and the transmitted data size (i.e. number of bytes)
    • Unique volumetric metrics: the number of unique destination IP addresses, the number of unique destination Autonomous System Numbers (ASNs), and the number of unique destination transport layer ports. The aggregation of \textit{Unique volumetric metrics} is memory intensive since all unique values must be stored in an array. We used a server with 41 GB of RAM, which was enough for 10-minute aggregation on the ISP network.
    • Ratios metrics: the ratio of UDP/TCP packets, the ratio of UDP/TCP transmitted data size, the direction ratio of packets, and the direction ratio of transmitted data size
    • Average metrics: the average flow duration, and the average Time To Live (TTL)

    Multiple time aggregation: The original datapoints in the dataset are aggregated by 10 minutes of network traffic. The size of the aggregation interval influences anomaly detection procedures, mainly the training speed of the detection model. However, the 10-minute intervals can be too short for longitudinal anomaly detection methods. Therefore, we added two more aggregation intervals to the datasets--1 hour and 1 day.

    Time series of institutions: We identify 283 institutions inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution's data.

    Time series of institutional subnets: We identify 548 institution subnets inside the CESNET3 network. These time series aggregated per each institution ID provide a view of the institution subnet's data.

    Data Records

    The file hierarchy is described below:

    cesnet-timeseries24/

    |- institution_subnets/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- institutions/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- ip_addresses_full/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- ip_addresses_sample/

    | |- agg_10_minutes/

    | |- agg_1_hour/

    | |- agg_1_day/

    | |- identifiers.csv

    |- times/

    | |- times_10_minutes.csv

    | |- times_1_hour.csv

    | |- times_1_day.csv

    |- ids_relationship.csv
    |- weekends_and_holidays.csv

    The following list describes time series data fields in CSV files:

    • id_time: Unique identifier for each aggregation interval within the time series, used to segment the dataset into specific time periods for analysis.
    • n_flows: Total number of flows observed in the aggregation interval, indicating the volume of distinct sessions or connections for the IP address.
    • n_packets: Total number of packets transmitted during the aggregation interval, reflecting the packet-level traffic volume for the IP address.
    • n_bytes: Total number of bytes transmitted during the aggregation interval, representing the data volume for the IP address.
    • n_dest_ip: Number of unique destination IP addresses contacted by the IP address during the aggregation interval, showing the diversity of endpoints reached.
    • n_dest_asn: Number of unique destination Autonomous System Numbers (ASNs) contacted by the IP address during the aggregation interval, indicating the diversity of networks reached.
    • n_dest_port: Number of unique destination transport layer ports contacted by the IP address during the aggregation interval, representing the variety of services accessed.
    • tcp_udp_ratio_packets: Ratio of packets sent using TCP versus UDP by the IP address during the aggregation interval, providing insight into the transport protocol usage pattern. This metric belongs to the interval <0, 1> where 1 is when all packets are sent over TCP, and 0 is when all packets are sent over UDP.
    • tcp_udp_ratio_bytes: Ratio of bytes sent using TCP versus UDP by the IP address during the aggregation interval, highlighting the data volume distribution between protocols. This metric belongs to the interval <0, 1> with same rule as tcp_udp_ratio_packets.
    • dir_ratio_packets: Ratio of packet directions (inbound versus outbound) for the IP address during the aggregation interval, indicating the balance of traffic flow directions. This metric belongs to the interval <0, 1>, where 1 is when all packets are sent in the outgoing direction from the monitored IP address, and 0 is when all packets are sent in the incoming direction to the monitored IP address.
    • dir_ratio_bytes: Ratio of byte directions (inbound versus outbound) for the IP address during the aggregation interval, showing the data volume distribution in traffic flows. This metric belongs to the interval <0, 1> with the same rule as dir_ratio_packets.
    • avg_duration: Average duration of IP flows for the IP address during the aggregation interval, measuring the typical session length.
    • avg_ttl: Average Time To Live (TTL) of IP flows for the IP address during the aggregation interval, providing insight into the lifespan of packets.

    Moreover, the time series created by re-aggregation contains following time series metrics instead of n_dest_ip, n_dest_asn, and n_dest_port:

    • sum_n_dest_ip: Sum of numbers of unique destination IP addresses.
    • avg_n_dest_ip: The average number of unique destination IP addresses.
    • std_n_dest_ip: Standard deviation of numbers of unique destination IP addresses.
    • sum_n_dest_asn: Sum of numbers of unique destination ASNs.
    • avg_n_dest_asn: The average number of unique destination ASNs.
    • std_n_dest_asn: Standard deviation of numbers of unique destination ASNs)
    • sum_n_dest_port: Sum of numbers of unique destination transport layer ports.
    • avg_n_dest_port: The average number of unique destination transport layer ports.
    • std_n_dest_port: Standard deviation of numbers of unique destination transport layer

  6. h

    monash_tsf

    • huggingface.co
    Updated Jun 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monash University (2022). monash_tsf [Dataset]. https://huggingface.co/datasets/Monash-University/monash_tsf
    Explore at:
    Dataset updated
    Jun 7, 2022
    Dataset authored and provided by
    Monash University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Monash Time Series Forecasting Repository which contains 30+ datasets of related time series for global forecasting research. This repository includes both real-world and competition time series datasets covering varied domains.

  7. m

    COVID-19: Time Series Datasets India versus World

    • data.mendeley.com
    • narcis.nl
    Updated Aug 30, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Salgotra (2020). COVID-19: Time Series Datasets India versus World [Dataset]. http://doi.org/10.17632/tmrs92j7pv.25
    Explore at:
    Dataset updated
    Aug 30, 2020
    Authors
    Rohit Salgotra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    World, India
    Description

    This dataset consists of COVID-19 time series data of India since 24th March 2020. The data set is for all the States and Union Territories of India and is divided into five parts, including i) Confirmed cases; ii) Death Count; iii) Recovered Cases; iv) Temperature of that place; and v) Percentage humidity in the region. The data set also provides basic details of confirmed cases and death count for all the countries of the world updated daily since 30 January 2020. The end user can contact the corresponding author (Rohit Salgotra : nicresearchgroup@gmail.com) for more details. . [Dataset is updated Twice a Week]

    The Authors can Refer to and CITE our latest Papers on COVID: 1. Rohit Salgotra, Mostafa Gandomi, Amir H Gandomi. "Evolutionary Modelling of the COVID-19 Pandemic in Fifteen Most Affected Countries" Chaos, Solitons & Fractals: (2020). https://doi.org/10.1016/j.chaos.2020.110118 2. Rohit Salgotra, Mostafa Gandomi, Amir H Gandomi. "Time Series Analysis and Forecast of the COVID-19 Pandemic in India using Genetic Programming" Chaos, Solitons & Fractals: (2020). https://doi.org/10.1016/j.chaos.2020.109945

  8. m

    Data from: Renewable Energy and Electricity Demand Time Series Dataset with...

    • data.mendeley.com
    • openenergyhub.ornl.gov
    Updated Apr 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sebastian Rojas Ortega (2023). Renewable Energy and Electricity Demand Time Series Dataset with Exogenous Variables at 5-minute Interval [Dataset]. http://doi.org/10.17632/fdfftr3tc2.1
    Explore at:
    Dataset updated
    Apr 26, 2023
    Authors
    Sebastian Rojas Ortega
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The described database was created using data obtained from the California Independent System Operator (CAISO) and the National Renewable Energy Laboratory (NREL). All data was collected at five-minute intervals, and subsequently cleaned and modified to create a database comprising three time series: solar energy production, wind energy production, and electricity demand. The database contains 12 columns, including date, station (1: Winter, 2: Spring, 3: Summer, 4: Autumn), day of the week (0: Monday, ... , 6: Sunday), DHI (W/m2), DNI (W/m2), GHI (W/m2), wind speed (m/s), humidity (%), temperature (degrees), solar energy production (MW), wind energy production (MW), and electricity demand (MW).

  9. Dynamical System Multivariate Time Series

    • zenodo.org
    • data.niaid.nih.gov
    csv
    Updated Jun 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Patrick Fleith; Patrick Fleith (2024). Dynamical System Multivariate Time Series [Dataset]. http://doi.org/10.5281/zenodo.11526904
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 8, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Patrick Fleith; Patrick Fleith
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Dynamical System Multivariate Time Series (DSMTS) Dataset consists of commands, external stimuli, and telemetry readings of a simulated complex dynamical system under fully nominal conditions (no outliers or anomalies).

    The DSMTS Dataset exhibits a set of desirable properties that make it very suitable for benchmarking Multivariate Time Series Forecasting especially for industrial processes of complex systems:

    • Multivariate (17 variables) including sensors reading and control signals. It simulates the operational behaviour of an arbitrary complex system including:
      • 4 Deliberate Actuations / Control Commands sent by a simulated operator / controller, for instance, commands of an operator to turn ON/OFF some equipment.
      • 3 Environmental Stimuli / External Forces acting on the system and affecting its behaviour, for instance, the wind affecting the orientation of a large ground antenna.
      • 10 Telemetry Readings representing the observable states of the complex system by means of sensors, for instance, a position, a temperature, a pressure, a voltage, current, humidity, velocity, acceleration, etc.
    • 5 million timestamps. Sensors readings are at 1Hz sampling frequency.
    • Pure signal ideal for robustness-to-noise analysis. The simulated signals are provided without noise: while this may seem unrealistic at first, it is an advantage since users of the dataset can decide to add on top of the provided series any type of noise and choose an amplitude. This makes it well suited to test how sensitive and robust detection algorithms are against various levels of noise.
    • No missing data. You can drop whatever data you want to assess the impact of missing values on your detector with respect to a clean baseline.
  10. lotsa_data

    • huggingface.co
    Updated Aug 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Salesforce (2025). lotsa_data [Dataset]. https://huggingface.co/datasets/Salesforce/lotsa_data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 3, 2025
    Dataset provided by
    Salesforce Inchttp://salesforce.com/
    Authors
    Salesforce
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    LOTSA Data

    The Large-scale Open Time Series Archive (LOTSA) is a collection of open time series datasets for time series forecasting. It was collected for the purpose of pre-training Large Time Series Models. See the paper and codebase for more information.

      Citation
    

    If you're using LOTSA data in your research or applications, please cite it using this BibTeX: BibTeX: @article{woo2024unified, title={Unified Training of Universal Time Series Forecasting Transformers}… See the full description on the dataset page: https://huggingface.co/datasets/Salesforce/lotsa_data.

  11. t

    Monash Time Series Forecasting Repository - Dataset - LDM

    • service.tib.eu
    Updated Oct 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Monash Time Series Forecasting Repository - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/monash-time-series-forecasting-repository
    Explore at:
    Dataset updated
    Oct 17, 2023
    Area covered
    City of Monash
    Description

    All datasets contain univariate time series and they are available in a new format that we name as .tsf, pioneered by the sktime .ts format.

  12. h

    Beam-Level-Traffic-Timeseries-Dataset

    • huggingface.co
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HUAWEI Netop Team (2025). Beam-Level-Traffic-Timeseries-Dataset [Dataset]. https://huggingface.co/datasets/netop/Beam-Level-Traffic-Timeseries-Dataset
    Explore at:
    Dataset updated
    Jun 18, 2025
    Dataset authored and provided by
    HUAWEI Netop Team
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📶 Beam-Level (5G) Time-Series Dataset

    This dataset introduces a novel multivariate time series specifically curated to support research in enabling accurate prediction of KPIs across communication networks, as illustrated below:

    Precise forecasting of network traffic is critical for optimizing network management and enhancing resource allocation efficiency. This task is of both practical and theoretical importance to researchers in networking and machine learning, offering a… See the full description on the dataset page: https://huggingface.co/datasets/netop/Beam-Level-Traffic-Timeseries-Dataset.

  13. d

    Introduction to Time Series Analysis for Hydrologic Data

    • dataone.org
    • beta.hydroshare.org
    • +1more
    Updated Dec 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriela Garcia; Kateri Salk (2021). Introduction to Time Series Analysis for Hydrologic Data [Dataset]. https://dataone.org/datasets/sha256%3Abeb9302f6cb5eee6fa9269c97b1b0f404cdfecd6b4b4767b2e3bd96919e2ad54
    Explore at:
    Dataset updated
    Dec 5, 2021
    Dataset provided by
    Hydroshare
    Authors
    Gabriela Garcia; Kateri Salk
    Time period covered
    Oct 1, 1974 - Jan 27, 2021
    Area covered
    Description

    This lesson was adapted from educational material written by Dr. Kateri Salk for her Fall 2019 Hydrologic Data Analysis course at Duke University. This is the first part of a two-part exercise focusing on time series analysis.

    Introduction

    Time series are a special class of dataset, where a response variable is tracked over time. The frequency of measurement and the timespan of the dataset can vary widely. At its most simple, a time series model includes an explanatory time component and a response variable. Mixed models can include additional explanatory variables (check out the nlme and lme4 R packages). We will be covering a few simple applications of time series analysis in these lessons.

    Opportunities

    Analysis of time series presents several opportunities. In aquatic sciences, some of the most common questions we can answer with time series modeling are:

    • Has there been an increasing or decreasing trend in the response variable over time?
    • Can we forecast conditions in the future?

      Challenges

    Time series datasets come with several caveats, which need to be addressed in order to effectively model the system. A few common challenges that arise (and can occur together within a single dataset) are:

    • Autocorrelation: Data points are not independent from one another (i.e., the measurement at a given time point is dependent on previous time point(s)).

    • Data gaps: Data are not collected at regular intervals, necessitating interpolation between measurements. There are often gaps between monitoring periods. For many time series analyses, we need equally spaced points.

    • Seasonality: Cyclic patterns in variables occur at regular intervals, impeding clear interpretation of a monotonic (unidirectional) trend. Ex. We can assume that summer temperatures are higher.

    • Heteroscedasticity: The variance of the time series is not constant over time.

    • Covariance: the covariance of the time series is not constant over time. Many of these models assume that the variance and covariance are similar over the time-->heteroschedasticity.

      Learning Objectives

    After successfully completing this notebook, you will be able to:

    1. Choose appropriate time series analyses for trend detection and forecasting

    2. Discuss the influence of seasonality on time series analysis

    3. Interpret and communicate results of time series analyses

  14. One-step ahead forecasting of geophysical processes within a purely...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia Papacharalampous; Hristos Tyralis (2023). One-step ahead forecasting of geophysical processes within a purely statistical framework: Supplementary material [Dataset]. http://doi.org/10.6084/m9.figshare.5357359.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Georgia Papacharalampous; Hristos Tyralis
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    Supplementary material for the paper entitled "One-step ahead forecasting of geophysical processes within a purely statistical framework"Abstract: The simplest way to forecast geophysical processes, an engineering problem with a widely recognised challenging character, is the so called “univariate time series forecasting” that can be implemented using stochastic or machine learning regression models within a purely statistical framework. Regression models are in general fast-implemented, in contrast to the computationally intensive Global Circulation Models, which constitute the most frequently used alternative for precipitation and temperature forecasting. For their simplicity and easy applicability, the former have been proposed as benchmarks for the latter by forecasting scientists. Herein, we assess the one-step ahead forecasting performance of 20 univariate time series forecasting methods, when applied to a large number of geophysical and simulated time series of 91 values. We use two real-world annual datasets, a dataset composed by 112 time series of precipitation and another composed by 185 time series of temperature, as well as their respective standardized datasets, to conduct several real-world experiments. We further conduct large-scale experiments using 12 simulated datasets. These datasets contain 24 000 time series in total, which are simulated using stochastic models from the families of Autoregressive Moving Average and Autoregressive Fractionally Integrated Moving Average. We use the first 50, 60, 70, 80 and 90 data points for model-fitting and model-validation and make predictions corresponding to the 51st, 61st, 71st, 81st and 91st respectively. The total number of forecasts produced herein is 2 177 520, among which 47 520 are obtained using the real-world datasets. The assessment is based on eight error metrics and accuracy statistics. The simulation experiments reveal the most and least accurate methods for long-term forecasting applications, also suggesting that the simple methods may be competitive in specific cases. Regarding the results of the real-world experiments using the original (standardized) time series, the minimum and maximum medians of the absolute errors are found to be 68 mm (0.55) and 189 mm (1.42) respectively for precipitation, and 0.23 °C (0.33) and 1.10 °C (1.46) respectively for temperature. Since there is an absence of relevant information in the literature, the numerical results obtained using the standardised real-world datasets could be used as rough benchmarks for the one-step ahead predictability of annual precipitation and temperature.

  15. h

    Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting...

    • huggingface.co
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xu (2025). Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting [Dataset]. https://huggingface.co/datasets/Wenyan0110/Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Xu
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The sp500stock_data_description.csv file provides detailed information on the existence of four modalities (text, image, time series, and table) for 4,213 S&P 500 stocks. The hs300stock_data_description.csv file provides detailed information on the existence of four modalities (text, image, time series, and table) for 858 HS 300 stocks.

      If you find our research helpful, please cite our paper:
    

    @article{xu2025finmultitime, title={FinMultiTime: A Four-Modal Bilingual Dataset for… See the full description on the dataset page: https://huggingface.co/datasets/Wenyan0110/Multimodal-Dataset-Image_Text_Table_TimeSeries-for-Financial-Time-Series-Forecasting.

  16. (for simple exercises) Time Series Forecasting

    • kaggle.com
    Updated Apr 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bulent Siyah (2020). (for simple exercises) Time Series Forecasting [Dataset]. https://www.kaggle.com/bulentsiyah/for-simple-exercises-time-series-forecasting/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 30, 2020
    Dataset provided by
    Kaggle
    Authors
    Bulent Siyah
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Alcohol_Sales.csv: This dataset was taken from https://fred.stlouisfed.org/series/S4248SM144NCEN(old url https://fred.stlouisfed.org/series/.)

    energydata_complete.csv: Experimental data used to create regression models of appliances energy use in a low energy building. Data Set Information: The data set is at 10 min for about 4.5 months. The house temperature and humidity conditions were monitored with a ZigBee wireless sensor network. Each wireless node transmitted the temperature and humidity conditions around 3.3 min. Then, the wireless data was averaged for 10 minutes periods. The energy data was logged every 10 minutes with m-bus energy meters. Weather from the nearest airport weather station (Chievres Airport, Belgium) was downloaded from a public data set from Reliable Prognosis (rp5.ru), and merged together with the experimental data sets using the date and time column. Two random variables have been included in the data set for testing the regression models and to filter out non predictive attributes (parameters). The original source of the dataset: http://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction

  17. Forecasting Book Sales

    • kaggle.com
    Updated May 27, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar Aguilar (2023). Forecasting Book Sales [Dataset]. https://www.kaggle.com/datasets/oscarm524/forecasting-book-sales
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 27, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Oscar Aguilar
    Description

    Because of the sheer number of products available, the German book market is one of the largest business trading today. In order to display a highly individual profile to customers and, at the same time, keep the effort involved in selecting and ordering as low as possible, the key to success for the bookshop therefore lies in the effective purchasing from a choice of roughly 96,000 new titles each year. The challenge for the bookseller is to buy the right amount of the right books at the right time.

    It is with this in mind that this year’s DATA MINING CUP Competition will be held in cooperation with Libri, Germany’s leading book wholesaler. Among Libri’s many successful support measures for booksellers, purchase recommendations give the bookshop a competitive advantage. Accordingly, the DATA MINING CUP 2009 challenge will be to forecast of purchase quantities of a clearly defined title portfolio per location, using simulated data.

    The Task

    The task of the DATA MINING CUP Competition 2009 is to forecast purchase quantities for 8 titles for 2,418 different locations. In order to create the model, simulated purchase data from an additional 2,394 locations will be supplied. All data refers to a fixed period of time. The object is to forecast the purchase quantities of these 8 different titles for the 2,418 locations as exactly as possible.

    The Data

    There are two text files available to assist in solving the problem: dmc2009_train.txt (train data file) and dmc2009_forecast.txt (data of 2,418 locations for whom a prediction is to be made).

    Acknowledgement

    This data is publicly available in the data-mining-website.

  18. u

    Time series decomposition

    • researchdata.up.ac.za
    txt
    Updated Nov 17, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simamkele Mtsengu (2021). Time series decomposition [Dataset]. http://doi.org/10.25403/UPresearchdata.16883317.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 17, 2021
    Dataset provided by
    University of Pretoria
    Authors
    Simamkele Mtsengu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The research conducted using this univariate data set is on time series decomposition and a review of how to implement four decomposition methods namely: Classical decomposition, X11, Signal extraction in ARIMA time series(SEATS) and Seasonal trend decomposition procedure based on Loess(STL) decomposition. Following decomposition, forecasting with decomposition is implemented on the monthly electricity available for distribution to South Africa by Eskom time series data set. R Studio was used for the research. explain the components of a time series, moving averages, . Other data sets as well as those that are R built-in were used in the second section of the work, that is, to illustrate the components of a time series and moving averages. Following this the monthly electricity available for distribution to South Africa by Eskom time series data set was used for the third and fourth section of the research. That is, to implement the time series decomposition methods, analyze the random component of the methods, as well as to forecast with decomposition and to compute the forecast accuracy of four different forecasting methods.

  19. Customer Sales Forecasting Dataset

    • kaggle.com
    Updated Jun 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahil Islam007 (2025). Customer Sales Forecasting Dataset [Dataset]. https://www.kaggle.com/datasets/sahilislam007/custom-sales-forecasting-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahil Islam007
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🗂 Dataset Description Title: Custom Sales Forecasting Dataset

    This dataset contains a synthetic yet realistic representation of product sales across multiple stores and time periods. It is designed for use in time series forecasting, retail analytics, or machine learning experiments focusing on demand prediction and inventory planning. Each row corresponds to daily sales data for a given product at a particular store, enriched with contextual information like promotions and holidays.

    This dataset is ideal for:

    Building and testing time series models (ARIMA, Prophet, LSTM, etc.)

    Forecasting product demand

    Evaluating store-level sales trends

    Training machine learning models with tabular time series data

    Column NameDescription
    order_idUnique identifier for the order placed by a customer.
    customer_idUnique identifier for the customer making the purchase.
    order_dateDate on which the order was placed (YYYY-MM-DD).
    product_categoryCategory of the product purchased (e.g., Sports, Home, Beauty).
    product_priceOriginal price of a single unit of the product (before discount).
    quantityNumber of units of the product ordered.
    payment_methodMethod used for payment (e.g., PayPal, Cash on Delivery).
    delivery_statusCurrent delivery status of the order (e.g., Delivered, Pending).
    cityCity to which the order was delivered.
    stateU.S. state where the customer is located.
    zipcodePostal code of the delivery location.
    product_idUnique identifier for the purchased product.
    discount_appliedFractional discount applied to the order (e.g., 0.20 for 20% off).
    order_valueTotal value of the order after discount (product_price * quantity * (1 - discount_applied)).
    review_ratingCustomer’s review rating of the order on a 1–5 scale.
    return_requestedBoolean value indicating if the customer requested a return (True/False).
  20. f

    Data from: Enriching time series datasets using Nonparametric kernel...

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Ivan Fanany (2023). Enriching time series datasets using Nonparametric kernel regression to improve forecasting accuracy [Dataset]. http://doi.org/10.6084/m9.figshare.1609661.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Mohamad Ivan Fanany
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Improving the accuracy of prediction on future values based on the past and current observations has been pursued by enhancing the prediction's methods, combining those methods or performing data pre-processing. In this paper, another approach is taken, namely by increasing the number of input in the dataset. This approach would be useful especially for a shorter time series data. By filling the in-between values in the time series, the number of training set can be increased, thus increasing the generalization capability of the predictor. The algorithm used to make prediction is Neural Network as it is widely used in literature for time series tasks. For comparison, Support Vector Regression is also employed. The dataset used in the experiment is the frequency of USPTO's patents and PubMed's scientific publications on the field of health, namely on Apnea, Arrhythmia, and Sleep Stages. Another time series data designated for NN3 Competition in the field of transportation is also used for benchmarking. The experimental result shows that the prediction performance can be significantly increased by filling in-between data in the time series. Furthermore, the use of detrend and deseasonalization which separates the data into trend, seasonal and stationary time series also improve the prediction performance both on original and filled dataset. The optimal number of increase on the dataset in this experiment is about five times of the length of original dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Tiana, store-sales-time-series-forecasting [Dataset]. https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting

store-sales-time-series-forecasting

t4tiana/store-sales-time-series-forecasting

Explore at:
14 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Tiana
Description

taken from this Kaggle competition:

  Dataset Description

In this competition, you will predict sales for the thousands of product families sold at Favorita stores located in Ecuador. The training data includes dates, store and product information, whether that item was being promoted, as well as the sales numbers. Additional files include supplementary information that may be useful in building your models.

  File Descriptions and Data Field Information

train.csv… See the full description on the dataset page: https://huggingface.co/datasets/t4tiana/store-sales-time-series-forecasting.

Search
Clear search
Close search
Google apps
Main menu