35 datasets found

n
ECMWF ERA5: 10 ensemble member surface level analysis parameter data
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Dec 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). ECMWF ERA5: 10 ensemble member surface level analysis parameter data [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=ensemble%20runs
Explore at:
Dataset updated
Dec 8, 2023
Description
This dataset contains ERA5 surface level analysis parameter data from 10 ensemble runs. ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble members were used to derive means and spread data (see linked datasets). Ensemble means and spreads were calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
n
ECMWF ERA5t: ensemble spreads of surface level analysis parameter data
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Jul 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ECMWF ERA5t: ensemble spreads of surface level analysis parameter data [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?format=Data%20are%20netCDF%20formatted%20with%20internal%20compression.
Explore at:
Dataset updated
Jul 28, 2021
Description
This dataset contains ensemble spreads for the ERA5 initial release (ERA5t) surface level analysis parameter data ensemble means (see linked dataset). ERA5t is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project initial release available upto 5 days behind the present data. CEDA will maintain a 6 month rolling archive of these data with overlap to the verified ERA5 data - see linked datasets on this record. The ensemble means and spreads are calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed and, if required, amended before the full ERA5 release. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record.
ERA5 hourly data on pressure levels from 1940 to present
cds.climate.copernicus.eu
grib
Updated Jun 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly data on pressure levels from 1940 to present [Dataset]. http://doi.org/10.24381/cds.bd0915c6
Explore at:
gribAvailable download formats
Unique identifier
https://doi.org/10.24381/cds.bd0915c6
Dataset updated
Jun 9, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
Time period covered
Jan 1, 1940 - Jun 3, 2025
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 hourly data on pressure levels from 1940 to present".
NCEP GEFS Mean Spread West Atlantic Forecast Products Imagery
data.ucar.edu
image
Updated Dec 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Centers for Environmental Prediction (NCEP) (2024). NCEP GEFS Mean Spread West Atlantic Forecast Products Imagery [Dataset]. http://doi.org/10.26023/PZ24-3M8N-TA0G
Explore at:
imageAvailable download formats
Unique identifier
https://doi.org/10.26023/PZ24-3M8N-TA0G
Dataset updated
Dec 26, 2024
Dataset provided by
University Corporation for Atmospheric Research
Authors
National Centers for Environmental Prediction (NCEP)
Time period covered
Jun 15, 2011 - Aug 1, 2011
Area covered

Description
This dataset contains gif images from the National Weather Service - National Centers for Environmental Prediction (NCEP) Global Ensemble Forecast System (GEFS) Mean Spread West Atlantic forecasts during the Ice in Clouds Experiment - Tropical (ICE-T) project. Note: There are no data available for 20110715-20110722.
Data from: Data and code from: Environmental influences on drying rate of...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated May 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2024). Data and code from: Environmental influences on drying rate of spray applied disinfestants from horticultural production services [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-environmental-influences-on-drying-rate-of-spray-applied-disinfestants-
Explore at:
Dataset updated
May 31, 2024
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
This dataset includes all the data and R code needed to reproduce the analyses in a forthcoming manuscript:Copes, W. E., Q. D. Read, and B. J. Smith. Environmental influences on drying rate of spray applied disinfestants from horticultural production services. PhytoFrontiers, DOI pending.Study description: Instructions for disinfestants typically specify a dose and a contact time to kill plant pathogens on production surfaces. A problem occurs when disinfestants are applied to large production areas where the evaporation rate is affected by weather conditions. The common contact time recommendation of 10 min may not be achieved under hot, sunny conditions that promote fast drying. This study is an investigation into how the evaporation rates of six commercial disinfestants vary when applied to six types of substrate materials under cool to hot and cloudy to sunny weather conditions. Initially, disinfestants with low surface tension spread out to provide 100% coverage and disinfestants with high surface tension beaded up to provide about 60% coverage when applied to hard smooth surfaces. Disinfestants applied to porous materials were quickly absorbed into the body of the material, such as wood and concrete. Even though disinfestants evaporated faster under hot sunny conditions than under cool cloudy conditions, coverage was reduced considerably in the first 2.5 min under most weather conditions and reduced to less than or equal to 50% coverage by 5 min. Dataset contents: This dataset includes R code to import the data and fit Bayesian statistical models using the model fitting software CmdStan, interfaced with R using the packages brms and cmdstanr. The models (one for 2022 and one for 2023) compare how quickly different spray-applied disinfestants dry, depending on what chemical was sprayed, what surface material it was sprayed onto, and what the weather conditions were at the time. Next, the statistical models are used to generate predictions and compare mean drying rates between the disinfestants, surface materials, and weather conditions. Finally, tables and figures are created. These files are included:Drying2022.csv: drying rate data for the 2022 experimental runWeather2022.csv: weather data for the 2022 experimental runDrying2023.csv: drying rate data for the 2023 experimental runWeather2023.csv: weather data for the 2023 experimental rundisinfestant_drying_analysis.Rmd: RMarkdown notebook with all data processing, analysis, and table creation codedisinfestant_drying_analysis.html: rendered output of notebookMS_figures.R: additional R code to create figures formatted for journal requirementsfit2022_discretetime_weather_solar.rds: fitted brms model object for 2022. This will allow users to reproduce the model prediction results without having to refit the model, which was originally fit on a high-performance computing clusterfit2023_discretetime_weather_solar.rds: fitted brms model object for 2023data_dictionary.xlsx: descriptions of each column in the CSV data files
The SPARC Data Initiative CFC-11, CFC-12, HF and SF6 climatologies from...
doi.pangaea.de
search.dataone.org
+1more
html, tsv
Updated 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michaela I Hegglin; Bernd Funke; Susann Tegtmeier; John Anderson; John C Gille; Ashley Jones; Lesley Smith; Thomas von Clarmann; Kaley A Walker (2016). The SPARC Data Initiative CFC-11, CFC-12, HF and SF6 climatologies from international satellite limb sounders [Dataset]. http://doi.org/10.1594/PANGAEA.849223
Explore at:
tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.849223
Dataset updated
2016
Dataset provided by
PANGAEA
Authors
Michaela I Hegglin; Bernd Funke; Susann Tegtmeier; John Anderson; John C Gille; Ashley Jones; Lesley Smith; Thomas von Clarmann; Kaley A Walker
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Variables measured
File name, Uniform resource locator/link to file
Description
A quality assessment of the CFC-11 (CCl3F), CFC-12 (CCl2F2), HF, and SF6 products from limb-viewing satellite instruments is provided by means of a detailed intercomparison. The climatologies in the form of monthly zonal mean time series are obtained from HALOE, MIPAS, ACE-FTS, and HIRDLS within the time period 1991-2010. The intercomparisons focus on the mean biases of the monthly and annual zonal mean fields and aim to identify their vertical, latitudinal and temporal structure. The CFC evaluations (based on MIPAS, ACE-FTS and HIRDLS) reveal that the uncertainty in our knowledge of the atmospheric CFC-11 and CFC-12 mean state, as given by satellite data sets, is smallest in the tropics and mid-latitudes at altitudes below 50 and 20 hPa, respectively, with a 1sigma multi-instrument spread of up to ±5 %. For HF, the situation is reversed. The two available data sets (HALOE and ACE-FTS) agree well above 100 hPa, with a spread in this region of ±5 to ±10 %, while at altitudes below 100 hPa the HF annual mean state is less well known, with a spread ±30 % and larger. The atmospheric SF6 annual mean states derived from two satellite data sets (MIPAS and ACE-FTS) show only very small differences with a spread of less than ±5 % and often below ±2.5 %. While the overall agreement among the climatological data sets is very good for large parts of the upper troposphere and lower stratosphere (CFCs, SF6) or middle stratosphere (HF), individual discrepancies have been identified. Pronounced deviations between the instrument climatologies exist for particular atmospheric regions which differ from gas to gas. Notable features are differently shaped isopleths in the subtropics, deviations in the vertical gradients in the lower stratosphere and in the meridional gradients in the upper troposphere, and inconsistencies in the seasonal cycle. Additionally, long-term drifts between the instruments have been identified for the CFC-11 and CFC-12 time series. The evaluations as a whole provide guidance on what data sets are the most reliable for applications such as studies of atmospheric transport and variability, model-measurement comparisons and detection of long-term trends.
n
ECMWF ERA5: ensemble spreads of surface level analysis parameter data
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Jul 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ECMWF ERA5: ensemble spreads of surface level analysis parameter data [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?orgName=European%20Centre%20for%20Medium-Range%20Weather%20Forecasts%20(ECMWF)
Explore at:
Dataset updated
Jul 28, 2021
Description
This dataset contains ensemble spreads for the ERA5 surface level analysis parameter data ensemble means (see linked dataset). ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble means and spreads are calculated from the ERA5 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.
f
Statistics of ‘diffrate’.
figshare.com
xls
Updated Mar 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuancheng Si; Saralees Nadarajah; Zongxin Zhang; Chunmin Xu (2024). Statistics of ‘diffrate’. [Dataset]. http://doi.org/10.1371/journal.pone.0299164.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0299164.t002
Dataset updated
Mar 13, 2024
Dataset provided by
PLOS ONE
Authors
Yuancheng Si; Saralees Nadarajah; Zongxin Zhang; Chunmin Xu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the dynamic landscape of financial markets, accurate forecasting of stock indices remains a pivotal yet challenging task, essential for investors and policymakers alike. This study is motivated by the need to enhance the precision of predicting the Shanghai Composite Index’s opening price spread, a critical measure reflecting market volatility and investor sentiment. Traditional time series models like ARIMA have shown limitations in capturing the complex, nonlinear patterns inherent in stock price movements, prompting the exploration of advanced methodologies. The aim of this research is to bridge the gap in forecasting accuracy by developing a hybrid model that integrates the strengths of ARIMA with deep learning techniques, specifically Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. This novel approach leverages the ARIMA model’s proficiency in linear trend analysis and the deep learning models’ capability in modeling nonlinear dependencies, aiming to provide a comprehensive tool for market prediction. Utilizing a comprehensive dataset covering the period from December 20, 1990, to June 2, 2023, the study develops and assesses the efficacy of ARIMA, LSTM, GRU, ARIMA-LSTM, and ARIMA-GRU models in forecasting the Shanghai Composite Index’s opening price spread. The evaluation of these models is based on key statistical metrics, including Mean Squared Error (MSE) and Mean Absolute Error (MAE), to gauge their predictive accuracy. The findings indicate that the hybrid models, ARIMA-LSTM and ARIMA-GRU, perform better in forecasting the opening price spread of the Shanghai Composite Index than their standalone counterparts. This outcome suggests that combining traditional statistical methods with advanced deep learning algorithms can enhance stock market prediction. The research contributes to the field by providing evidence of the potential benefits of integrating different modeling approaches for financial forecasting, offering insights that could inform investment strategies and financial decision-making.
t
Heat index at 2 m above ground: A globally gridded dataset based on...
service.tib.eu
Updated Nov 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Heat index at 2 m above ground: A globally gridded dataset based on reanalysis data from 1979-2013, links to GeoTIFFs - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/png-doi-10-1594-pangaea-841057
Explore at:
Dataset updated
Nov 30, 2024
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
The increase in global mean temperatures resulting from climate change has wide reaching consequences for the earth's ecosystems and other natural systems. Many studies have been devoted to evaluating the distribution and effects of these changes. We go a step further and evaluate global changes to the heat index, a measure of temperature as perceived by humans. Heat index, which is computed from temperature and relative humidity, is more important than temperature for the health of humans and other animals. Even in cases where the heat index does not reach dangerous levels from a health perspective, it has been shown to be an important factor in worker productivity and thus in economic productivity. We compute heat index from dewpoint temperature and absolute temperature 2 m above ground from the ERA-Interim reanalysis dataset for the years 1979-2013. The data is provided aggregated to daily minima, means and maxima. Furthermore, the data is temporally aggregated to monthly and yearly values and spatially aggregated to the level of countries after being weighted by population density in order to demonstrate its usefulness for the analysis of its impact on human health and productivity. The resulting data deliver insights into the spatiotemporal development of near-ground heat index during the course of the past 3 decades. It is shown that the impact of changing heat index is unevenly distributed through space and time, affecting some areas differently than others. The likelihood of dangerous heat index events has increased globally. Also, heat index climate groups that would formerly be expected closer to the tropics have spread latitudinally to include areas closer to the poles. The data can serve in future studies as a basis for evaluating and understanding the evolution of heat index in the course of climate change, as well as its impact on human health and productivity.
Z
Network traffic datasets with novel extended IP flow called NetTiSA flow
data.niaid.nih.gov
Updated Apr 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Josef Koumar (2024). Network traffic datasets with novel extended IP flow called NetTiSA flow [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8301042
Explore at:
Dataset updated
Apr 18, 2024
Dataset provided by
Josef Koumar
Jaroslav Pešek
Tomáš Čejka
Karel Hynek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Network traffic datasets with novel extended IP flow called NetTiSA flow

Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:

Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286

@article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }

This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.

NetTiSA flow feature vector

The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.

Flow features

The flow features are:

Packets is the number of packets in the direction from the source to the destination IP address.

Packets in reverse order is the number of packets in the direction from the destination to the source IP address.

Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.

Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.

Statistical and Time-based features

The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:

Mean represents mean of the payload lengths of packets

Min is the minimal value from payload lengths of all packets in a flow

Max is the maximum value from payload lengths of all packets in a flow

Standard deviation is a measure of the variation of payload lengths from the mean payload length

Root mean square is the measure of the magnitude of payload lengths of packets

Average dispersion is the average absolute difference between each payload length of the packet and the mean value

Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution

Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )

Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)

Min from time differences is the minimal value from all time differences, i.e., min space between packets.

Max from time differences is the maximum value from all time differences, i.e., max space between packets.

Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })

Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})

where \(s_n\) is number of switches.

Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:

Max minus min is the difference between minimum and maximum payload lengths

Percent deviation is the dispersion of the average absolute difference to the mean value

Variance is the spread measure of the data from its mean

Burstiness is the degree of peakedness in the central part of the distribution

Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement

Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.

Duration is the duration of the flow

The NetTiSA flow is implemented into IP flow exporter ipfixprobe.

Description of dataset files

In the following table is a description of each dataset file:

File name

Detection problem

Citation of the original raw dataset

botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.

cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022

dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.

doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020

doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022

dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.

edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.

https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020

ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.

unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.

iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23

ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021

ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.
i
Copernicus
sextant.ifremer.fr
pigma.org
www:link +1
Updated Sep 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ERA5 monthly averaged data on single levels from 1979 to present (2021). Copernicus [Dataset]. https://sextant.ifremer.fr/geonetwork/srv/api/records/ff2cd349-ecab-48e1-817a-1ed87dc0c4be
Explore at:
www:link-1.0-http--publication-url, www:linkAvailable download formats
Dataset updated
Sep 6, 2021
Dataset provided by
ERA5 monthly averaged data on single levels from 1979 to present
Area covered

Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 4 to 7 decades. Currently data is available from 1950, split into Climate Data Store entries for 1950-1978 (preliminary back extension) and from 1979 onwards (final release plus timely updates, this page). ERA5 replaces the ERA-Interim reanalysis.

Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product.

ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread.

ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. So far this has not been the case and when this does occur users will be notified.

The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications.

An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines.

Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities).

The present entry is "ERA5 monthly mean data on single levels from 1979 to present".
ERA5 hourly time-series data on single levels from 1940 to present
cds-stable-bopen.copernicus-climate.eu
cds.climate.copernicus.eu
netcdf
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ECMWF (2025). ERA5 hourly time-series data on single levels from 1940 to present [Dataset]. https://cds-stable-bopen.copernicus-climate.eu/datasets/reanalysis-era5-single-levels-timeseries
Explore at:
netcdfAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
European Centre for Medium-Range Weather Forecastshttp://ecmwf.int/
Authors
ECMWF
License
https://object-store.os-api.cci2.ecmwf.int:443/bopen-cds2-stable-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/bopen-cds2-stable-catalogue/licences/licence-to-use-copernicus-products/licence-to-use-copernicus-products_b4b9451f54cffa16ecef5c912c9cebd6979925a956e3fa677976e0cf198c2c18.pdf
Time period covered
Jan 1, 1940 - Dec 6, 2024
Description
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days. In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The dataset presented here is a regridded subset of the full ERA5 data set on native resolution that is stored in a format designed for retrieving long time-series for a single point. When the requested location does not match the exact location of a grid point then the nearest grid point is used instead. It is this source of ERA5 data that is used by the ERA-Explorer to ensure response times required for the interactive web-application. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines.
NOAA/CIRES Twentieth Century Global Reanalysis Version 2c
rda.ucar.edu
oidc.rda.ucar.edu
+1more
Updated Mar 16, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gilbert Compo; Jeffrey Whitaker; Prashant Sardeshmukh; Robert Allan; Chesley McColl; Xungang Yin; Benjamin Giese; Russell Vose; Nobuki Matsui; Linden Ashcroft; Renate Auchmann; Mac Benoy; Pierre Bessemoulin; Theo Brandsma; Philip Brohan; Manola Brunet; Joseph Comeaux; Thomas Cram; Richard Crouthamel; Pavel Groisman; Hans Hersbach; Philip Jones; Trausti Jonsson; Sylvie Jourdain; Gail Kelly; Kenneth Knapp; Andries Kruger; Hisayuki Kubota; Gianluca Lentini; Andrew Lorrey; Neal Lott; Sandra Lubker; Jurg Luterbacher; Gareth Marshall; Maurizio Maugeri; Cary Mock; Hing Mok; Oyvind Nordli; Rajmund Przybylak; Mark Rodwell; Thomas Ross; Douglas Schuster; Lidija Srnec; Maria Valente; Zsuzsanna Vizi; Xiaolan Wang; Nancy Westcott; John Woollen; Steven Worley (2015). NOAA/CIRES Twentieth Century Global Reanalysis Version 2c [Dataset]. http://doi.org/10.5065/D6N877TW
Explore at:
Unique identifier
https://doi.org/10.5065/D6N877TW
Dataset updated
Mar 16, 2015
Dataset provided by
University Corporation for Atmospheric Research
Authors
Gilbert Compo; Jeffrey Whitaker; Prashant Sardeshmukh; Robert Allan; Chesley McColl; Xungang Yin; Benjamin Giese; Russell Vose; Nobuki Matsui; Linden Ashcroft; Renate Auchmann; Mac Benoy; Pierre Bessemoulin; Theo Brandsma; Philip Brohan; Manola Brunet; Joseph Comeaux; Thomas Cram; Richard Crouthamel; Pavel Groisman; Hans Hersbach; Philip Jones; Trausti Jonsson; Sylvie Jourdain; Gail Kelly; Kenneth Knapp; Andries Kruger; Hisayuki Kubota; Gianluca Lentini; Andrew Lorrey; Neal Lott; Sandra Lubker; Jurg Luterbacher; Gareth Marshall; Maurizio Maugeri; Cary Mock; Hing Mok; Oyvind Nordli; Rajmund Przybylak; Mark Rodwell; Thomas Ross; Douglas Schuster; Lidija Srnec; Maria Valente; Zsuzsanna Vizi; Xiaolan Wang; Nancy Westcott; John Woollen; Steven Worley
Time period covered
Dec 31, 1850 - Dec 31, 2014
Area covered
Earth
Description
The Twentieth Century Reanalysis Project, produced by the Earth System Research Laboratory Physical Sciences Division from NOAA and the University of Colorado Cooperative Institute for Research in Environmental Sciences, is an effort to produce a global reanalysis dataset spanning a portion of the nineteenth century and the entire twentieth century (1851 - near present), assimilating only surface observations of synoptic pressure. Boundary conditions of pentad sea surface temperature and monthly sea ice concentration and time-varying solar, volcanic, and carbon dioxide radiative forcings are prescribed. Products include 6-hourly ensemble mean and spread analysis fields on a 2 by 2 degree global latitude-longitude grid, and 3 and 6-hourly ensemble mean and spread forecast (first guess) fields on a global Gaussian T62 grid. Fields are accessible in yearly time series (1 file per parameter) and monthly synoptic time (all parameters per synoptic hour) files. This dataset provides the first estimates of global tropospheric variability spanning 1851 to 2012 at six-hourly resolution. Fields from 1851 to 1860 are a first attempt at this period and will be improved in future versions. Fields from 1861 to 2011 are most relevant for climate and weather studies. 20th Century Reanalysis Version 2c uses the same model as version 2 with new sea ice boundary conditions from the COBE-SST2 (Hirahara et al. 2014), new pentad Simple Ocean Data Assimilation with sparse input (SODAsi.2, Giese et al. 2015) sea surface temperature fields from through 2012, Daily High-Resolution-Blended Analyses for Sea Surface Temperature starting with 2013, and additional observations from ISPD version 3.2.9.

A low pressure bias in marine pressures from the US Maury Collection (Woodruff et al. 2005, Wallbrink et al. 2009, ) appears to have affected the resultant 20CR version 2c mass-related fields (e.g., pressure, geopotential height) from 1851 to about 1865. Please see opportunities for improvement [https://www.esrl.noaa.gov/psd/data/gridded/20thC_ReanV2c/opportunities.html] for additional information.

The Twentieth Century Reanalysis Project version 2c used resources of the National Energy Research Scientific Computing Center managed by Lawrence Berkeley National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Version 2c is a contribution to the international Atmospheric Circulation Reconstructions over the Earth initiative. Support for the Twentieth Century Reanalysis Project is provided by the U.S. Department of Energy Office of Science (BER) and the NOAA Climate Program Office MAPP program.
i
MEDSEA_CH1_Product_1 / Wind and wave data set from MARINA project
sextant.ifremer.fr
pigma.org
doi, ogc:ows-c +1
Updated Nov 21, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Athens (2016). MEDSEA_CH1_Product_1 / Wind and wave data set from MARINA project [Dataset]. https://sextant.ifremer.fr/geonetwork/srv/api/records/c669abf9-31a3-40d0-9954-3e8a31f2bf73
Explore at:
ogc:ows-c, www:link, doiAvailable download formats
Dataset updated
Nov 21, 2016
Dataset provided by
EMODnet Medsea Checkpoint
University of Athens
Time period covered
Jan 1, 2001 - Dec 31, 2010
Area covered

Description
Today's normative and regulatory requirements to assess the producible energy from wind rely on in situ measurements (mast with anemometric sensors), which are extremely costly to Implement offshore. However, proof should be provided that hindcast model results are highly reliable, in order to provide an equivalent assessment. Very high resolution models is also the key issue in decision making for a proper siting that is relaying on the consistency of all datasets provided in the assessment. In this tender the products of the FP7 MARINA project will be used. 10-year (2001-2010) highresolution atmospheric, wave, tidal and ocean current simulations will be used. The model outputs are at high resolution (0.05x0.05 degree horizontal resolution, 1-hour time resolution, 5-vertical levels at 10,40,80,120,180 m). The wave parameters are co-located with the meteorological output fields. Satellite altimetry data from ENVISAT and JASON satellites have been assimilated in the system. Other wind and wave satellite data sets will be also analyzed (Synthetic Aperture Radars-SAR for example). At the same co-located points the tidal and ocean current data together with bathymetry are available. For preselected points in the North Western Mediterranean (Spain-France-ltaly areas) directional wave spectra data have been saved and are available. From SKIRON meteorological model available parameters are: WIND SPEED (m/s), WIND DIRECTION (deg), AIR PRESSURE (hPa), AIR DENSITY (Kgr/m3), TEMPERATURE (K), MODEL SEAMASK From the wave model available parameters: SIGNIFICANT WAVE HEIGHT (m), MEAN WAVE DIRECTION (deg), WAVE MEAN PERIOD (s), PEAK WAVE PRERIOD (s), SWELL WAVE HEIGHT (m), MEAN SWELL PERIOD (s), MEAN DIRECTIONAL SPREAD, WINDSEA MEAN DIRECTIONAL SPREAD, SWELL MEAN DIRECTIONAL SPREAD, MAXIMUM WAVE HEIGHT (m)
SmartBay Ireland Galway Bay Buoy Wave - Dataset - data.gov.ie
data.gov.ie
Updated Nov 2, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.gov.ie (2016). SmartBay Ireland Galway Bay Buoy Wave - Dataset - data.gov.ie [Dataset]. https://data.gov.ie/dataset/smartbay-ireland-galway-bay-buoy-wave
Explore at:
Dataset updated
Nov 2, 2016
Dataset provided by
data.gov.ie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ireland, Galway
Description
This data comprises wave data collected from the SmartBay buoy moored in Galway Bay. The TRIAXYS Directional Wave Sensor collects wave data and returns the following parameters: No Zero Crossings (Number) HAvg (Average Wave Height) (Meters) Tz (Mean Spectral Period) (Seconds) HMax (Max Wave Height) (Meters) HSig (Significant Wave Height) (Meters) TSig (Significant Period) (Seconds) H10 (Meters) T10 (Seconds) TAvg (Mean Wave Period) (Seconds) TP (Peak Period) (Seconds) TP5 (Seconds) HMO (Meters) Mean Direction (Degrees) Mean Spread (Degrees) The TRIAXYS Directional Wave Sensor is comprised of three accelerometers and three rate sensors that ultimately measure the total displacement along the three orthogonal axes of the floating platform. In addition, this sensor is equipped with a gimballed fluxgate compass to measure true magnetic direction.
Trace-Share Dataset for Evaluation of Trace Meaning Preservation
zenodo.org
data.niaid.nih.gov
csv, zip
Updated May 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Milan Cermak; Milan Cermak; Tomas Madeja; Tomas Madeja (2020). Trace-Share Dataset for Evaluation of Trace Meaning Preservation [Dataset]. http://doi.org/10.5281/zenodo.3547528
Explore at:
csv, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3547528
Dataset updated
May 7, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Milan Cermak; Milan Cermak; Tomas Madeja; Tomas Madeja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains all data used during the evaluation of trace meaning preservation. Archives are protected by password "trace-share" to avoid false detection by antivirus software.

For more information, see the project repository at https://github.com/Trace-Share.

Selected Attack Traces

The following list contains trace datasets used for evaluation. Each attack was chosen to have not only a different meaning but also different statistical properties.

dos_http_flood — the capture of GET and POST requests sent to one server by one attacker (HTTP~traffic);

ftp_bruteforce — short and unsuccessful attempt to guess a user’s password for FTP service (FTP traffic);

ponyloader_botnet — Pony Loader botnet used for stealing of credentials from 3 target devices reporting to single IP with a large number of intermediate addresses (DNS and HTTP traffic);

scan — the capture of nmap tool that scans given subnet using ICMP echo and TCP SYN requests (consist of ARP, ICMP, and TCP traffic);

wannacry_ransomware — the capture of Wanacry ransomware that spreads in a domain with three workstations, a domain controller, and a file-sharing server (SMB and SMBv2 traffic).

Background Traffic Data

Publicly available dataset CSE-CIC-IDS-2018 was used as a background traffic data. The evaluation uses data from the day Thursday-01-03-2018 containing a sufficient proportion of regular traffic without any statistically significant attacks. Only traffic aimed at victim machines (range 172.31.69.0/24) is used to reduce less significant traffic.

Evaluation Results and Dataset Structure

Traces variants (traces.zip)

./traces-original/ — trace PCAP files and crawled details in YAML format;

./traces-normalized — normalized PCAP files and details in YAML format;

./traces-adjusted — adjusted PCAP files using various timestamp generation settings, combination configuration in YAML format, and lables provided by ID2T in XML format.

Extracted alerts (alerts.zip)

./alerts-original/ — extracted Suricata alerts, Suricata log, and full Suricata output for all original trace files;

./alerts-normalized/ — extracted Suricata alerts, Suricata log, and full Suricata output for all normalized trace files;

./alerts-adjusted/ — extracted Suricata alerts, Suricata log, and full Suricata output for all adjusted trace files.

Evaluation results

*.csv files in the root directory — data contains extracted alert signatures and their count per each trace variant.
f
Datasheet1_Mobility data shows effectiveness of control strategies for...
frontiersin.figshare.com
figshare.com
pdf
Updated Mar 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small (2024). Datasheet1_Mobility data shows effectiveness of control strategies for COVID-19 in remote, sparse and diffuse populations.pdf [Dataset]. http://doi.org/10.3389/fepid.2023.1201810.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fepid.2023.1201810.s001
Dataset updated
Mar 7, 2024
Dataset provided by
Frontiers
Authors
Yuval Berman; Shannon D. Algar; David M. Walker; Michael Small
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data that is collected at the individual-level from mobile phones is typically aggregated to the population-level for privacy reasons. If we are interested in answering questions regarding the mean, or working with groups appropriately modeled by a continuum, then this data is immediately informative. However, coupling such data regarding a population to a model that requires information at the individual-level raises a number of complexities. This is the case if we aim to characterize human mobility and simulate the spatial and geographical spread of a disease by dealing in discrete, absolute numbers. In this work, we highlight the hurdles faced and outline how they can be overcome to effectively leverage the specific dataset: Google COVID-19 Aggregated Mobility Research Dataset (GAMRD). Using a case study of Western Australia, which has many sparsely populated regions with incomplete data, we firstly demonstrate how to overcome these challenges to approximate absolute flow of people around a transport network from the aggregated data. Overlaying this evolving mobility network with a compartmental model for disease that incorporated vaccination status we run simulations and draw meaningful conclusions about the spread of COVID-19 throughout the state without de-anonymizing the data. We can see that towns in the Pilbara region are highly vulnerable to an outbreak originating in Perth. Further, we show that regional restrictions on travel are not enough to stop the spread of the virus from reaching regional Western Australia. The methods explained in this paper can be therefore used to analyze disease outbreaks in similarly sparse populations. We demonstrate that using this data appropriately can be used to inform public health policies and have an impact in pandemic responses.

E-commerce Sales Prediction Dataset

kaggle.com

Updated Dec 14, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Nevil Dhinoja (2024). E-commerce Sales Prediction Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10197264

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/10197264

Dataset updated

Dec 14, 2024

Dataset provided by

Kaggle

Authors

Nevil Dhinoja

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

E-commerce Sales Prediction Dataset

This repository contains a comprehensive and clean dataset for predicting e-commerce sales, tailored for data scientists, machine learning enthusiasts, and researchers. The dataset is crafted to analyze sales trends, optimize pricing strategies, and develop predictive models for sales forecasting.

📂 Dataset Overview

The dataset includes 1,000 records across the following features:

Column Name	Description
Date	The date of the sale (01-01-2023 onward).
Product_Category	Category of the product (e.g., Electronics, Sports, Other).
Price	Price of the product (numerical).
Discount	Discount applied to the product (numerical).
Customer_Segment	Buyer segment (e.g., Regular, Occasional, Other).
Marketing_Spend	Marketing budget allocated for sales (numerical).
Units_Sold	Number of units sold per transaction (numerical).

📊 Data Summary

General Properties

Date: - Range: 01-01-2023 to 12-31-2023. - Contains 1,000 unique values without missing data.

Product_Category: - Categories: Electronics (21%), Sports (21%), Other (58%). - Most common category: Electronics (21%).

Price: - Range: From 244 to 999. - Mean: 505, Standard Deviation: 290. - Most common price range: 14.59 - 113.07.

Discount: - Range: From 0.01% to 49.92%. - Mean: 24.9%, Standard Deviation: 14.4%. - Most common discount range: 0.01 - 5.00%.

Customer_Segment: - Segments: Regular (35%), Occasional (34%), Other (31%). - Most common segment: Regular.

Marketing_Spend: - Range: From 2.41k to 10k. - Mean: 4.91k, Standard Deviation: 2.84k.

Units_Sold: - Range: From 5 to 57. - Mean: 29.6, Standard Deviation: 7.26. - Most common range: 24 - 34 units sold.

📈 Data Visualizations

The dataset is suitable for creating the following visualizations: - 1. Price Distribution: Histogram to show the spread of prices. - 2. Discount Distribution: Histogram to analyze promotional offers. - 3. Marketing Spend Distribution: Histogram to understand marketing investment patterns. - 4. Customer Segment Distribution: Bar plot of customer segments. - 5. Price vs Units Sold: Scatter plot to show pricing effects on sales. - 6. Discount vs Units Sold: Scatter plot to explore the impact of discounts. - 7. Marketing Spend vs Units Sold: Scatter plot for marketing effectiveness. - 8. Correlation Heatmap: Identify relationships between features. - 9. Pairplot: Visualize pairwise feature interactions.

💡 How the Data Was Created

The dataset is synthetically generated to mimic realistic e-commerce sales trends. Below are the steps taken for data generation:

Feature Engineering:
- Identified key attributes such as product category, price, discount, and marketing spend, typically observed in e-commerce data.
- Generated dependent features like units sold based on logical relationships.
Data Simulation:
- Python Libraries: Used NumPy and Pandas to generate and distribute values.
- Statistical Modeling: Ensured feature distributions aligned with real-world sales data patterns.
Validation:
- Verified data consistency with no missing or invalid values.
- Ensured logical correlations (e.g., higher discounts → increased units sold).

Note: The dataset is synthetic and not sourced from any real-world e-commerce platform.

🛠 Example Usage: Sales Prediction Model

Here’s an example of building a predictive model using Linear Regression:

Written in python

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset
df = pd.read_csv('ecommerce_sales.csv')

# Feature selection
X = df[['Price', 'Discount', 'Marketing_Spend']]
y = df['Units_Sold']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Model training
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared: {r2:.2f}')

g
Corona data donation - Partial data set Vital data
gimi9.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Corona data donation - Partial data set Vital data [Dataset]. https://gimi9.com/dataset/eu_https-zenodo-org-record-8229284/
Explore at:
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data from fitness wristbands and smartwatches, so-called wearables, can provide indications of symptoms of COVID-19 disease. With the help of the Corona data donation app (CDA), citizens were able to make such data available to the Robert Koch Institute for scientific purposes. Together with information from other sources, e.g. official reporting data on case numbers, these data help scientists to better record and understand the spread of the coronavirus.The data points provided in this repository contain spatially and temporally aggregated information on the mean resting heart rate, the mean daily step count and the mean sleep duration per day and per county and district. A visual and interactive preparation of the data can already be found in the Vitaldaten-Explorer, which was provided by the CDA team.The data points provided here serve the further use in science and the interested public. They cover the full CDA survey period from April 2020 to December 2022. Since the data provided are spatial averages, it is not possible to draw conclusions about individuals.
n
ECMWF ERA5.1: ensemble spreads of surface level analysis parameter data for...
data-search.nerc.ac.uk
catalogue.ceda.ac.uk
Updated Sep 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). ECMWF ERA5.1: ensemble spreads of surface level analysis parameter data for 2000-2006 [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=ensemble%20run
Explore at:
Dataset updated
Sep 18, 2021
Description
This dataset contains spreads for the ERA5.1 surface level analysis parameter data ensemble means (see linked dataset) over the period 2000-2006. ERA5.1 is the European Centre for Medium-Range Weather Forecasts (ECWMF) ERA5 reanalysis project re-run for 2000-2006 to improve upon the cold bias in the lower stratosphere seen in ERA5 (see technical memorandum 859 in the linked documentation section for further details). The ensemble means and spreads are calculated from the ERA5.1 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). The main ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data, ERA5t, are also available upto 5 days behind the present. A limited selection of data from these runs are also available via CEDA, whilst full access is available via the Copernicus Data Store.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2023). ECMWF ERA5: 10 ensemble member surface level analysis parameter data [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=ensemble%20runs

ECMWF ERA5: 10 ensemble member surface level analysis parameter data

Explore at:

Dataset updated

Dec 8, 2023

Description

This dataset contains ERA5 surface level analysis parameter data from 10 ensemble runs. ERA5 is the 5th generation reanalysis project from the European Centre for Medium-Range Weather Forecasts (ECWMF) - see linked documentation for further details. The ensemble members were used to derive means and spread data (see linked datasets). Ensemble means and spreads were calculated from the ERA5t 10 member ensemble, run at a reduced resolution compared with the single high resolution (hourly output at 31 km grid spacing) 'HRES' realisation, for which these data have been produced to provide an uncertainty estimate. This dataset contains a limited selection of all available variables and have been converted to netCDF from the original GRIB files held on the ECMWF system. They have also been translated onto a regular latitude-longitude grid during the extraction process from the ECMWF holdings. For a fuller set of variables please see the linked Copernicus Data Store (CDS) data tool, linked to from this record. Note, ensemble standard deviation is often referred to as ensemble spread and is calculated as the standard deviation of the 10-members in the ensemble (i.e., including the control). It is not the sample standard deviation, and thus were calculated by dividing by 10 rather than 9 (N-1). See linked datasets for ensemble member and ensemble mean data. The ERA5 global atmospheric reanalysis of the covers 1979 to 2 months behind the present month. This follows on from the ERA-15, ERA-40 rand ERA-interim re-analysis projects. An initial release of ERA5 data (ERA5t) is made roughly 5 days behind the present date. These will be subsequently reviewed ahead of being released by ECMWF as quality assured data within 3 months. CEDA holds a 6 month rolling copy of the latest ERA5t data. See related datasets linked to from this record. However, for the period 2000-2006 the initial ERA5 release was found to suffer from stratospheric temperature biases and so new runs to address this issue were performed resulting in the ERA5.1 release (see linked datasets). Note, though, that Simmons et al. 2020 (technical memo 859) report that "ERA5.1 is very close to ERA5 in the lower and middle troposphere." but users of data from this period should read the technical memo 859 for further details.

Clear search

Close search

Google apps

Main menu

ECMWF ERA5: 10 ensemble member surface level analysis parameter data

ECMWF ERA5t: ensemble spreads of surface level analysis parameter data

ERA5 hourly data on pressure levels from 1940 to present

NCEP GEFS Mean Spread West Atlantic Forecast Products Imagery

Data from: Data and code from: Environmental influences on drying rate of...

The SPARC Data Initiative CFC-11, CFC-12, HF and SF6 climatologies from...

ECMWF ERA5: ensemble spreads of surface level analysis parameter data

Statistics of ‘diffrate’.

Heat index at 2 m above ground: A globally gridded dataset based on...

Network traffic datasets with novel extended IP flow called NetTiSA flow

Copernicus

ERA5 hourly time-series data on single levels from 1940 to present

NOAA/CIRES Twentieth Century Global Reanalysis Version 2c

MEDSEA_CH1_Product_1 / Wind and wave data set from MARINA project

SmartBay Ireland Galway Bay Buoy Wave - Dataset - data.gov.ie

Trace-Share Dataset for Evaluation of Trace Meaning Preservation

Datasheet1_Mobility data shows effectiveness of control strategies for...

E-commerce Sales Prediction Dataset

E-commerce Sales Prediction Dataset

📂 Dataset Overview

📊 Data Summary

General Properties

📈 Data Visualizations

💡 How the Data Was Created

🛠 Example Usage: Sales Prediction Model

Written in python

Corona data donation - Partial data set Vital data

ECMWF ERA5.1: ensemble spreads of surface level analysis parameter data for...

ECMWF ERA5: 10 ensemble member surface level analysis parameter dataSee More Versions

ECMWF ERA5: 10 ensemble member surface level analysis parameter data