Dataset used for quantitative evaluation in the paper:
Y. Meidan, D. Avraham, H. Libhaber and A. Shabtai, "CADeSH: Collaborative Anomaly Detection for Smart Homes," in IEEE Internet of Things Journal, 2022, doi: 10.1109/JIOT.2022.3194813.
This is a table of flow-level traffic data which was continuously captured during a period of 21 days from five real home networks which were subscribed to a smart home security service, and from our lab at Ben-Gurion University of The Negev. This security service provider shared with us these network traffic flows, plus the related DNS requests and responses, and reputation intelligence of the destination IP addresses. Each instance in this dataset represents an outbound network traffic flow (in the form of an IPFIX) which emanated from an instance of the IoT model streamer.Amazon.Fire_TV_Gen_3.
In our lab, we infected our streamer.Amazon.Fire_TV_Gen_3 with a cryptominer and executed cryptomining from this device. To imitate a scanning activity typically performed by some botnets, we also scanned the network using Nmap. In accordance, we labeled these malicious activities as (1) `is executing cryptomining,' or (2) `being scanned by Nmap.' All of the remaining IPFIXs captured in our lab or on the home networks were labeled as `assumed benign'.
The multitude of real home networks, and the multitude of identical source devices, enable using this dataset for quantitative evaluation of (collaborative) anomaly/attack detection methods, especially for the IoT.
The Development of an Internet of Things (IoT) Network Traffic Dataset with Simulated Attack Data.
Abstract— This research focuses on the requirements for and the creation of an intrusion detection system (IDS) dataset for an Internet of Things (IoT) network domain.
A minimal requirements Internet of Things (IoT) network system was built to produce a dataset according to IDS testing needs for IoT security. Testing was performed with 12 scenarios and resulted in 24 datasets which consisted of normal, attack and combined normal-attack traffic data. Testing focused on three denial of service (DoS) and distributed denial of service (DDoS) attacks—“finish” (FIN) flood, User Datagram Protocol (UDP) flood, and Zbassocflood/association flood—using two communication protocols, IEEE 802.11 (WiFi) and IEEE 802.15.4 (ZigBee). A preprocessing test result obtained 95 attributes for the WiFi datasets and 64 attributes for the Xbee datasets .
TCP FIN Flood Attack Pattern Recognition on Internet of Things with Rule Based Signature Analysis
Abstract-Focus of this research is TCP FIN flood attack pattern recognition in Internet of Things (IoT) network using rule based signature analysis method. Dataset is taken based on three scenarios normal, attack and normal-attack. The process of identification and recognition of TCP FIN flood attack pattern is done based on observation and analysis of packet attribute from raw data (pcap) using a feature extraction and feature selection method. Further testing was conducted using snort as an IDS. The results of the confusion matrix detection rate evaluation against the snort as IDS show the average percentage of the precision level.
Citing
Citation data : "TCP FIN Flood Attack Pattern Recognition on Internet of Things with Rule Based Signature Analysis" - https://online-journals.org/index.php/i-joe/article/view/9848
@article{article,
author = {Stiawan, Deris and Wahyudi, Dimas and Heryanto, Ahmad and Sahmin, Samsuryadi and Idris, Yazid and Muchtar, Farkhana and Alzahrani, Mohammed and Budiarto, Rahmat},
year = {2019},
month = {04},
pages = {124},
title = {TCP FIN Flood Attack Pattern Recognition on Internet of Things with Rule Based Signature Analysis},
volume = {15},
journal = {International Journal of Online and Biomedical Engineering (iJOE)},
doi = {10.3991/ijoe.v15i07.9848}
}
Features Extraction on IoT Intrusion Detection System Using Principal Components Analysis (PCA)
Feature extraction solves the problem of finding the most efficient and comprehensive set of features. A Principle Component Analysis (PCA) feature extraction algorithm is applied to optimize the effectiveness of feature extraction to build an effective intrusion detection method. This paper uses the Principal Components Analysis (PCA) for features extraction on intrusion detection system with the aim to improve the accuracy and precision of the detection. The impact of features extraction to attack detection was examined. Experiments on a network traffic dataset created from an Internet of Thing (IoT) testbed network topology were conducted and the results show that the accuracy of the detection reaches 100 percent.
Citing
Citation data : "Features Extraction on IoT Intrusion Detection System Using Principal Components Analysis (PCA)" - https://ieeexplore.ieee.org/document/9251292
@inproceedings{inproceedings,
author = {Sharipuddin, and Purnama, Benni and Kurniabudi, Kurniabudi and Winanto, Eko and Stiawan, Deris and Hanapi, Darmawiiovo and Idris, Mohd and Budiarto, Rahmat},
year = {2020},
month = {10},
pages = {114-118},
title = {Features Extraction on IoT Intrusion Detection System Using Principal Components Analysis (PCA)},
doi = {10.23919/EECSI50503.2020.9251292}
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
With the rapid development of Internet of Things technology, security has become increasingly important since it collects a lot of personal information. IoT devices have resource constraints, which makes traditional cryptographic algorithms ineffective for IoT security. Lightweight cryptographic algorithms are needed to overcome the limitations of IoT devices. Due to its popularity and wide use in IoT applications, Raspberry Pi security plays an important role in Raspberry Pi applications. Analyzing existing works and understanding leading countries, keywords, authors, journals, and citations is crucial to identifying research trends and patterns in Raspberry Pi security. For the purpose of finding the information needed, bibliometric analysis was conducted using performance mapping, science mapping, and enrichment techniques. Our analysis included 979 Scopus articles, 214 WOS articles, and 144 IEEE Xplorer articles which were published during 2015-2023, and all of which were result of integrated and cleansed using the methods described in the methods section. By using R, VOS viewer, and the bibliometrix library, we analyzed and visualized bibliometric data. We discovered India is the leading research country, Archarya.B, and Bansod. G. are the most relevant authors, the Internet of Things, light-weight cryptography, and cryptography are the most relevant sets of words, and IEEE Access is the most significant journal. It was identified that developing a lightweight cryptographic algorithm for Raspberry Pi boards would be a significant future research focus.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dr. Daqing Chen, Course Director: MSc Data Science. chend '@' lsbu.ac.uk, School of Engineering, London South Bank University, London SE1 0AA, UK.
This Online Retail II data set contains all the transactions occurring for a UK-based and registered, non-store online retail between 01/12/2009 and 09/12/2011.The company mainly sells unique all-occasion gift-ware. Many customers of the company are wholesalers.
InvoiceNo: Invoice number. Nominal. A 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation. StockCode: Product (item) code. Nominal. A 5-digit integral number uniquely assigned to each distinct product. Description: Product (item) name. Nominal. Quantity: The quantities of each product (item) per transaction. Numeric. InvoiceDate: Invice date and time. Numeric. The day and time when a transaction was generated. UnitPrice: Unit price. Numeric. Product price per unit in sterling (£). CustomerID: Customer number. Nominal. A 5-digit integral number uniquely assigned to each customer. Country: Country name. Nominal. The name of the country where a customer resides.
Chen, D. Sain, S.L., and Guo, K. (2012), Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208. doi: [Web Link]. Chen, D., Guo, K. and Ubakanma, G. (2015), Predicting customer profitability over time based on RFM time series, International Journal of Business Forecasting and Marketing Intelligence, Vol. 2, No. 1, pp.1-18. doi: [Web Link]. Chen, D., Guo, K., and Li, Bo (2019), Predicting Customer Profitability Dynamically over Time: An Experimental Comparative Study, 24th Iberoamerican Congress on Pattern Recognition (CIARP 2019), Havana, Cuba, 28-31 Oct, 2019. Laha Ale, Ning Zhang, Huici Wu, Dajiang Chen, and Tao Han, Online Proactive Caching in Mobile Edge Computing Using Bidirectional Deep Recurrent Neural Network, IEEE Internet of Things Journal, Vol. 6, Issue 3, pp. 5520-5530, 2019. Rina Singh, Jeffrey A. Graves, Douglas A. Talbert, William Eberle, Prefix and Suffix Sequential Pattern Mining, Industrial Conference on Data Mining 2018: Advances in Data Mining. Applications and Theoretical Aspects, pp. 309-324. 2018.
If you have no special citation requests, please leave this field blank.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The radio map, or spectrum environment map (SEM), can visualize the information of invisible electromagnetic spectrum, and is vital for monitoring, management, and security of spectrum resources in cognitive radio (CR) networks. It is useful for the abnormal spectral activity detection, radiation source localization, spectrum resource management, etc. This project presents a measured radio map dataset in the urban scenario with multiple radiation sources, aiming to address the limitation of open datasets for radio map in realistic multi-source dynamic scenarios. We used a spectral signal receiving system to measure the signal intensity of multiple radiation sources in the urban scene. This project includes two datasets as 1) Raw radio map measurement data (30 MHz, 115 MHz, and 2 GHz), in the format of.csv. It includes entries such as longitude, latitude, altitude, start and end frequencies, frequency interval, number of acquisition points, and signal strength. 2) Raw spectrum tensor data (30 MHz, 115 MHz, and 2 GHz), in the format of.mat.
More details about the construction of the spectrum map and dataset can be found in the following references. [1]. Q. Zhu et al., DEMO Abstract: An UAV-based 3D Spectrum Real-time Mapping System, 2022 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), New York, NY, USA, 2022, pp. 1-2. [2] J. Wang et al., "Sparse Bayesian Learning-Based Hierarchical Construction for 3D Radio Environment Maps Incorporating Channel Shadowing," in IEEE Transactions on Wireless Communications, vol. 23, no. 10, pp. 14560-14574, Oct. 2024. [3]. Q. Gao, et al. Time-Variant Radio Map Reconstruction with Optimized Distributed Sensors in Dynamic Spectrum Environments[J]. IEEE Internet of Things Journal, early access, Feb.2025, doi: 10.1109/JIOT.2025.3545542.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
persons and vehicles in rural environments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Network traffic datasets with novel extended IP flow called NetTiSA flow
Datasets were created for the paper: NetTiSA: Extended IP Flow with Time-series Features for Universal Bandwidth-constrained High-speed Network Traffic Classification -- Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka -- which is published in The International Journal of Computer and Telecommunications Networking https://doi.org/10.1016/j.comnet.2023.110147Please cite the usage of our datasets as:
Josef Koumar, Karel Hynek, Jaroslav Pešek, Tomáš Čejka, "NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification", Computer Networks, Volume 240, 2024, 110147, ISSN 1389-1286
@article{KOUMAR2024110147, title = {NetTiSA: Extended IP flow with time-series features for universal bandwidth-constrained high-speed network traffic classification}, journal = {Computer Networks}, volume = {240}, pages = {110147}, year = {2024}, issn = {1389-1286}, doi = {https://doi.org/10.1016/j.comnet.2023.110147}, url = {https://www.sciencedirect.com/science/article/pii/S1389128623005923}, author = {Josef Koumar and Karel Hynek and Jaroslav Pešek and Tomáš Čejka} }
This Zenodo repository contains 23 datasets created from 15 well-known published datasets, which are cited in the table below. Each dataset contains the NetTiSA flow feature vector.
NetTiSA flow feature vector
The novel extended IP flow called NetTiSA (Network Time Series Analysed) flow contains a universal bandwidth-constrained feature vector consisting of 20 features. We divide the NetTiSA flow classification features into three groups by computation. The first group of features is based on classical bidirectional flow information---a number of transferred bytes, and packets. The second group contains statistical and time-based features calculated using the time-series analysis of the packet sequences. The third type of features can be computed from the previous groups (i.e., on the flow collector) and improve the classification performance without any impact on the telemetry bandwidth.
Flow features
The flow features are:
Packets is the number of packets in the direction from the source to the destination IP address.
Packets in reverse order is the number of packets in the direction from the destination to the source IP address.
Bytes is the size of the payload in bytes transferred in the direction from the source to the destination IP address.
Bytes in reverse order is the size of the payload in bytes transferred in the direction from the destination to the source IP address.
Statistical and Time-based features
The features that are exported in the extended part of the flow. All of them can be computed (exactly or in approximative) by stream-wise computation, which is necessary for keeping memory requirements low. The second type of feature set contains the following features:
Mean represents mean of the payload lengths of packets
Min is the minimal value from payload lengths of all packets in a flow
Max is the maximum value from payload lengths of all packets in a flow
Standard deviation is a measure of the variation of payload lengths from the mean payload length
Root mean square is the measure of the magnitude of payload lengths of packets
Average dispersion is the average absolute difference between each payload length of the packet and the mean value
Kurtosis is the measure describing the extent to which the tails of a distribution differ from the tails of a normal distribution
Mean of relative times is the mean of the relative times which is a sequence defined as (st = {t_1 - t_1, t_2 - t_1, ..., t_n - t_1} )
Mean of time differences is the mean of the time differences which is a sequence defined as (dt = { t_j - t_i | j = i + 1, i \in {1, 2, \dots, n - 1} }.)
Min from time differences is the minimal value from all time differences, i.e., min space between packets.
Max from time differences is the maximum value from all time differences, i.e., max space between packets.
Time distribution describes the deviation of time differences between individual packets within the time series. The feature is computed by the following equation:(tdist = \frac{ \frac{1}{n-1} \sum_{i=1}^{n-1} \left| \mu_{{dt_{n-1}}} - dt_i \right| }{ \frac{1}{2} \left(max\left({dt_{n-1}}\right) - min\left({dt_{n-1}}\right) \right) })
Switching ratio represents a value change ratio (switching) between payload lengths. The switching ratio is computed by equation:(sr = \frac{s_n}{\frac{1}{2} (n - 1)})
where \(s_n\) is number of switches.
Features computed at the collectorThe third set contains features that are computed from the previous two groups prior to classification. Therefore, they do not influence the network telemetry size and their computation does not put additional load to resource-constrained flow monitoring probes. The NetTiSA flow combined with this feature set is called the Enhanced NetTiSA flow and contains the following features:
Max minus min is the difference between minimum and maximum payload lengths
Percent deviation is the dispersion of the average absolute difference to the mean value
Variance is the spread measure of the data from its mean
Burstiness is the degree of peakedness in the central part of the distribution
Coefficient of variation is a dimensionless quantity that compares the dispersion of a time series to its mean value and is often used to compare the variability of different time series that have different units of measurement
Directions describe a percentage ratio of packet direction computed as (\frac{d_1}{ d_1 + d_0}), where (d_1) is a number of packets in a direction from source to destination IP address and (d_0) the opposite direction. Both (d_1) and (d_0) are inside the classical bidirectional flow.
Duration is the duration of the flow
The NetTiSA flow is implemented into IP flow exporter ipfixprobe.
Description of dataset files
In the following table is a description of each dataset file:
File name
Detection problem
Citation of the original raw dataset
botnet_binary.csv Binary detection of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
botnet_multiclass.csv Multi-class classification of botnet S. García et al. An Empirical Comparison of Botnet Detection Methods. Computers & Security, 45:100–123, 2014.
cryptomining_design.csv Binary detection of cryptomining; the design part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
cryptomining_evaluation.csv Binary detection of cryptomining; the evaluation part Richard Plný et al. Datasets of Cryptomining Communication. Zenodo, October 2022
dns_malware.csv Binary detection of malware DNS Samaneh Mahdavifar et al. Classifying Malicious Domains using DNS Traffic Analysis. In DASC/PiCom/CBDCom/CyberSciTech 2021, pages 60–67. IEEE, 2021.
doh_cic.csv Binary detection of DoH Mohammadreza MontazeriShatoori et al. Detection of doh tunnels using time-series classification of encrypted traffic. In DASC/PiCom/CBDCom/CyberSciTech 2020, pages 63–70. IEEE, 2020
doh_real_world.csv Binary detection of DoH Kamil Jeřábek et al. Collection of datasets with DNS over HTTPS traffic. Data in Brief, 42:108310, 2022
dos.csv Binary detection of DoS Nickolaos Koroniotis et al. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst., 100:779–796, 2019.
edge_iiot_binary.csv Binary detection of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
edge_iiot_multiclass.csv Multi-class classification of IoT malware Mohamed Amine Ferrag et al. Edge-iiotset: A new comprehensive realistic cyber security dataset of iot and iiot applications: Centralized and federated learning, 2022.
https_brute_force.csv Binary detection of HTTPS Brute Force Jan Luxemburk et al. HTTPS Brute-force dataset with extended network flows, November 2020
ids_cic_binary.csv Binary detection of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
ids_cic_multiclass.csv Multi-class classification of intrusion in IDS Iman Sharafaldin et al. Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1:108–116, 2018.
unsw_binary.csv Binary detection of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
unsw_multiclass.csv Multi-class classification of intrusion in IDS Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In 2015 military communications and information systems conference (MilCIS), pages 1–6. IEEE, 2015.
iot_23.csv Binary detection of IoT malware Sebastian Garcia et al. IoT-23: A labeled dataset with malicious and benign IoT network traffic, January 2020. More details here https://www.stratosphereips.org /datasets-iot23
ton_iot_binary.csv Binary detection of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets. Sustainable Cities and Society, 72:102994, 2021
ton_iot_multiclass.csv Multi-class classification of IoT malware Nour Moustafa. A new distributed architecture for evaluating ai-based security systems at the edge: Network ton iot datasets.
The BED dataset
Version 1.0.0
Please cite as: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021.
Disclaimer
While every care has been taken to ensure the accuracy of the data included in the BED dataset, the authors and the University of the West of Scotland, Durham University, and Universitat de València do not provide any guaranties and disclaim all responsibility and all liability (including without limitation, liability in negligence) for all expenses, losses, damages (including indirect or consequential damage) and costs which you might incur as a result of the provided data being inaccurate or incomplete in any way and for any reason. 2020, University of the West of Scotland, Scotland, United Kingdom.
Contact
For inquiries regarding the BED dataset, please contact:
Dataset summary
BED (Biometric EEG Dataset) is a dataset specifically designed to test EEG-based biometric approaches that use relatively inexpensive consumer-grade devices, more specifically the Emotiv EPOC+ in this case. This dataset includes EEG responses from 21 subjects to 12 different stimuli, across 3 different chronologically disjointed sessions. We have also considered stimuli aimed to elicit different affective states, so as to facilitate future research on the influence of emotions on EEG-based biometric tasks. In addition, we provide a baseline performance analysis to outline the potential of consumer-grade EEG devices for subject identification and verification. It must be noted that, in this work, EEG data were acquired in a controlled environment in order to reduce the variability in the acquired data stemming from external conditions.
The stimuli include:
For more details regarding the experimental protocol and the design of the dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, 2021. (Under review)
Dataset structure and contents
The BED dataset contains EEG recordings from 21 subjects, acquired during 3 similar sessions for each subject. The sessions were spaced one week apart from each other.
The BED dataset includes:
The dataset is organised in 3 folders:
RAW/ Contains the RAW files
RAW/sN/ Contains the RAW files associated with subject N
Each folder sN is composed by the following files:
- sN_s1.csv, sN_s2.csv, sN_s3.csv -- Files containing the EEG recordings for subject N and session 1, 2, and 3, respectively. These files contain 39 columns:
COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 ...UNUSED DATA... UNIX_TIMESTAMP
- subject_N_session_1_time_X.log, subject_N_session_2_time_X.log, subject_N_session_3_time_X.log -- Log files containing the sequence of events for the subject N and the session 1,2, and 3 respectively.
RAW_PARSED/
Contains Matlab files named sN_sM.mat. The files contain the recordings for the subject N in the session M. These files are composed by two variables:
- recording: size (time@256Hz x 17), Columns: COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 UNIX_TIMESTAMP
- events: cell array with size (events x 3) START_UNIX END_UNIX ADDITIONAL_INFO
START_UNIX is the UNIX timestamp in which the event starts
END_UNIX is the UNIX timestamp in which the event ends
ADDITIONAL INFO contains a struct with additional information regarding the specific event, in the case of the images, the expected score, the voted score, in the case of the cognitive task the input, in the case of the VEP the pattern and the frequency, etc..
Features/
Features/Identification
Features/Identification/[ARRC|MFCC|SPEC]/: Each of these folders contain the extracted features ready for classification for each of the stimuli, each file is composed by two variables, "feat" the feature matrix and "Y" the label matrix.
- feat: N x number of features
- Y: N x 2 (the #subject and the #session)
- INFO: Contains details about the event same as the ADDITIONAL INFO
Features/Verification: This folder is composed by 3 different files each of them with one different set of features extracted. Each file is composed by one cstruct array composed by:
- data: the time-series features, as described in the paper
- y: the #subject
- stimuli: the stimuli by name
- session: the #session
- INFO: Contains details about the event
The features provided are in sequential order, so index 1 and index 2, etc. are sequential in time if they belong to the same stimulus.
Additional information
For additional information regarding the creation of the BED dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ESC Dataset presented in the paper Elastic Smart Contracts in Blockchains (10.1109/JAS.2021.1004222)
IEEE/CAA Journal of Automatica Sinica ( Volume: 8, Issue: 12, December 2021
In this paper, we deal with questions related to blockchains in complex Internet of Things (IoT)-based ecosystems. Such ecosystems are typically composed of IoT devices, edge devices, cloud computing software services, as well as people, who are decision makers in scenarios such as smart cities. Many decisions related to analytics can be based on data coming from IoT sensors, software services, and people. However, they are typically based on different levels of abstraction and granularity. This poses a number of challenges when multiple blockchains are used together with smart contracts. This work proposes to apply our concept of elasticity to smart contracts and thereby enabling analytics in and between multiple blockchains in the context of IoT. We propose a reference architecture for Elastic Smart Contracts and evaluate the approach in a smart city scenario, discussing the benefits in terms of performance and self-adaptability of our solution.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset used for quantitative evaluation in the paper:
Y. Meidan, D. Avraham, H. Libhaber and A. Shabtai, "CADeSH: Collaborative Anomaly Detection for Smart Homes," in IEEE Internet of Things Journal, 2022, doi: 10.1109/JIOT.2022.3194813.
This is a table of flow-level traffic data which was continuously captured during a period of 21 days from five real home networks which were subscribed to a smart home security service, and from our lab at Ben-Gurion University of The Negev. This security service provider shared with us these network traffic flows, plus the related DNS requests and responses, and reputation intelligence of the destination IP addresses. Each instance in this dataset represents an outbound network traffic flow (in the form of an IPFIX) which emanated from an instance of the IoT model streamer.Amazon.Fire_TV_Gen_3.
In our lab, we infected our streamer.Amazon.Fire_TV_Gen_3 with a cryptominer and executed cryptomining from this device. To imitate a scanning activity typically performed by some botnets, we also scanned the network using Nmap. In accordance, we labeled these malicious activities as (1) `is executing cryptomining,' or (2) `being scanned by Nmap.' All of the remaining IPFIXs captured in our lab or on the home networks were labeled as `assumed benign'.
The multitude of real home networks, and the multitude of identical source devices, enable using this dataset for quantitative evaluation of (collaborative) anomaly/attack detection methods, especially for the IoT.