By 2025, forecasts suggest that there will be more than ** billion Internet of Things (IoT) connected devices in use. This would be a nearly threefold increase from the IoT installed base in 2019. What is the Internet of Things? The IoT refers to a network of devices that are connected to the internet and can “communicate” with each other. Such devices include daily tech gadgets such as the smartphones and the wearables, smart home devices such as smart meters, as well as industrial devices like smart machines. These smart connected devices are able to gather, share, and analyze information and create actions accordingly. By 2023, global spending on IoT will reach *** trillion U.S. dollars. How does Internet of Things work? IoT devices make use of sensors and processors to collect and analyze data acquired from their environments. The data collected from the sensors will be shared by being sent to a gateway or to other IoT devices. It will then be either sent to and analyzed in the cloud or analyzed locally. By 2025, the data volume created by IoT connections is projected to reach a massive total of **** zettabytes. Privacy and security concerns Given the amount of data generated by IoT devices, it is no wonder that data privacy and security are among the major concerns with regard to IoT adoption. Once devices are connected to the Internet, they become vulnerable to possible security breaches in the form of hacking, phishing, etc. Frequent data leaks from social media raise earnest concerns about information security standards in today’s world; were the IoT to become the next new reality, serious efforts to create strict security stands need to be prioritized.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This archive contains the files submitted to the 4th International Workshop on Data: Acquisition To Analysis (DATA) at SenSys. Files provided in this package are associated with the paper titled "Dataset: Analysis of IFTTT Recipes to Study How Humans Use Internet-of-Things (IoT) Devices"
With the rapid development and usage of Internet-of-Things (IoT) and smart-home devices, researchers continue efforts to improve the ''smartness'' of those devices to address daily needs in people's lives. Such efforts usually begin with understanding evolving user behaviors on how humans utilize the devices and what they expect in terms of their behavior. However, while research efforts abound, there is a very limited number of datasets that researchers can use to both understand how people use IoT devices and to evaluate algorithms or systems for smart spaces. In this paper, we collect and characterize more than 50,000 recipes from the online If-This-Then-That (IFTTT) service to understand a seemingly straightforward but complicated question: ''What kinds of behaviors do humans expect from their IoT devices?'' The dataset we collected contains the basic information of the IFTTT rules, trigger and action event, and how many people are using each rule.
For more detail about this dataset, please refer to the paper listed above.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The CIC IoT Dataset 2023 is a comprehensive benchmark developed by the Canadian Institute for Cybersecurity (CIC) to advance intrusion detection research in real-world Internet of Things (IoT) environments. This dataset was created using a network of 105 actual IoT devices, encompassing smart home gadgets, sensors, and cameras, to simulate authentic IoT traffic and attack scenarios.
Key Features:
Diverse Attack Scenarios: The dataset includes 33 distinct attacks categorized into seven classes: DDoS, DoS, Reconnaissance, Web-based, Brute Force, Spoofing, and Mirai. These attacks were executed by compromised IoT devices targeting other IoT devices, reflecting realistic threat vectors.(University of New Brunswick)
Extensive Data Collection: Network traffic was captured in real-time, resulting in over 46 million records. The data is available in various formats, including raw PCAP files and pre-extracted CSV features, facilitating different research needs.
Realistic IoT Topology: Unlike many datasets that rely on simulations, this dataset was generated using a large-scale IoT testbed with devices from multiple vendors, providing a heterogeneous and realistic network environment.
Benchmarking and Evaluation: The dataset has been utilized to evaluate the performance of machine learning and deep learning algorithms in classifying and detecting malicious versus benign IoT network traffic.(University of New Brunswick)
This dataset serves as a valuable resource for researchers and practitioners aiming to develop and test security analytics applications, intrusion detection systems, and other cybersecurity solutions tailored for IoT ecosystems.(University of New Brunswick)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
and secure
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This is a dataset containing a country-level breakdown of infected and Exposed IoT devices detected through sinkholes, honeypots and darknets operated by The Shadowserver Foundation and its partners. The data is grouped by IoT related threats. In some cases a vulnerability id is provided as a threat name - this is for cases when an IP was seen attempting to exploit an IoT related vulnerability by a honeypot, but no threat related information was acquired. This dataset was created as part of the EU CEF VARIoT project https://variot.eu
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The CICIoT2023 dataset is a large-scale, realistic intrusion detection dataset designed to support security analytics and machine learning research in the Internet of Things (IoT) domain. Created by the Canadian Institute for Cybersecurity (CIC), the dataset captures 33 different types of attacks (including DDoS, DoS, Recon, Web-based, Brute Force, Spoofing, and Mirai) executed by malicious IoT devices against other IoT targets.
The testbed consists of 105 real IoT devices of different types and manufacturers, including smart home devices and industrial equipment, configured in a complex network topology to emulate real-world conditions. The dataset includes benign and malicious traffic in various formats and supports feature extraction for both traditional ML and deep learning models.
This dataset aims to address the lack of diversity and scale in previous IoT security datasets, offering a robust benchmark for evaluating intrusion detection systems (IDS) and enabling research in IoT cybersecurity, anomaly detection, and network forensics.
Aposemat IoT-23 - a Labeled Dataset with Malcious and Benign Iot Network Traffic
Homepage: https://www.stratosphereips.org/datasets-iot23 This dataset contains a subset of the data from 20 captures of Malcious network traffic and 3 captures from live Benign Traffic on Internet of Things (IoT) devices. Created by Sebastian Garcia, Agustin Parmisano, & Maria Jose Erquiaga at the Avast AIC laboratory with the funding of Avast Software, this dataset is one of the best in the field for… See the full description on the dataset page: https://huggingface.co/datasets/19kmunz/iot-23-preprocessed-minimumcolumns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises network traffic collected from 24 Internet of Things (IoT) devices over a span of 119 days, capturing a total of over 110 million packets. The devices represent 19 distinct types and were monitored in a controlled environment under normal operating conditions, reflecting a variety of functions and behaviors typical of consumer IoT products (pcapIoT). The packet capture (pcap) files preserve complete packet information across all protocol layers, including ARP, TCP, HTTP, and various application-layer protocols. Raw pcap files (pcapFull) are also provided, which contain traffic from 36 non-IoT devices present in the network. To facilitate device-specific analysis, a CSV file is included that maps each IoT device to its unique MAC address. This mapping simplifies the identification and filtering of packets belonging to each device within the pcap files. 3 extra CSV (CSVs) files provide metadate about the states that the devices were in at different times. Additionally, Python scripts (Scripts) are provided to assist in extracting and processing packets. These scripts include functionalities such as packet filtering based on MAC addresses and protocol-specific data extraction, serving as practical examples for data manipulation and analysis techniques. This dataset is valuable for researchers interested in network behavior analysis, anomaly detection, and the development of IoT-specific network policies. It enables the study and differentiation of network behaviors based on device functions and supports behavior-based profiling to identify irregular activities or potential security threats.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IoT-23 is a dataset of network traffic from Internet of Things (IoT) devices. It has 20 malware captures executed in IoT devices, and 3 captures for benign IoT devices traffic. It was first published in January 2020, with captures ranging from 2018 to 2019. These IoT network traffic was captured in the Stratosphere Laboratory, AIC group, FEL, CTU University, Czech Republic. Its goal is to offer a large dataset of real and labeled IoT malware infections and IoT benign traffic for researchers to develop machine learning algorithms. This dataset and its research was funded by Avast Software. The malware was allow to connect to the Internet.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IoT-FSCIT is a dataset collected for research purposes. This dataset contains statistical features for five distinct IoT devices data, collected over six weeks at laboratory in Universiti Malaya.
This dataset is comprised of NetFlow records, which capture the outbound network traffic of 8 commercial IoT devices and 5 non-IoT devices, collected during a period of 37 days in a lab at Ben-Gurion University of The Negev. The dataset was collected in order to develop a method for telecommunication providers to detect vulnerable IoT models behind home NATs. Each NetFlow record is labeled with the device model which produced it; for research reproducibilty, each NetFlow is also allocated to either the "training" or "test" set, in accordance with the partitioning described in:
Y. Meidan, V. Sachidananda, H. Peng, R. Sagron, Y. Elovici, and A. Shabtai, A novel approach for detecting vulnerable IoT devices connected behind a home NAT, Computers & Security, Volume 97, 2020, 101968, ISSN 0167-4048, https://doi.org/10.1016/j.cose.2020.101968. (http://www.sciencedirect.com/science/article/pii/S0167404820302418)
Please note:
# NetFlow features, used in the related paper for analysis
'FIRST_SWITCHED': System uptime at which the first packet of this flow was switched
'IN_BYTES': Incoming counter for the number of bytes associated with an IP Flow
'IN_PKTS': Incoming counter for the number of packets associated with an IP Flow
'IPV4_DST_ADDR': IPv4 destination address
'L4_DST_PORT': TCP/UDP destination port number
'L4_SRC_PORT': TCP/UDP source port number
'LAST_SWITCHED': System uptime at which the last packet of this flow was switched
'PROTOCOL': IP protocol byte (6: TCP, 17: UDP)
'SRC_TOS': Type of Service byte setting when there is an incoming interface
'TCP_FLAGS': Cumulative of all the TCP flags seen for this flow
# Features added by the authors
'IP': Prefix of the destination IP address, representing the network (without the host)
'DURATION': Time (seconds) between first/last packet switching
# Label
'device_model':
# Partition
'partition': Training or test
# Additional NetFlow features (mostly zero-variance)
'SRC_AS': Source BGP autonomous system number
'DST_AS': Destination BGP autonomous system number
'INPUT_SNMP': Input interface index
'OUTPUT_SNMP': Output interface index
'IPV4_SRC_ADDR': IPv4 source address
'MAC': MAC address of the source
# Additional data
'category': IoT or non-IoT
'type': IoT, access_point, smartphone, laptop
'date': Datepart of FIRST_SWITCHED
'inter_arrival_time': Time (seconds) between successive flows of the same device (identified by its MAC address)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Smart homes contain programmable electronic devices (mostly IoT) that enable home au- tomation. People who live in smart homes benefit from interconnected devices by controlling them either remotely or manually/autonomously. However, high interconnectivity comes with an increased attack surface, making the smart home an attractive target for adversaries. NCC Group and the Global Cyber Alliance recorded over 12,000 attacks to log into smart home devices maliciously. Recent statistics show that over 200 million smart homes can be subjected to these attacks. Conventional security systems are either focused on network traffic (e.g., firewalls) or physical environment (e.g., CCTV or basic motion sensors), but not both. A key challenge in de- veloping cyber-physical security systems is the lack of datasets and test beds. For cyber-physical datasets to be meaningful, they need to be collected in real smart home environments. Due to the inherited difficulties and challenges (e.g. effort, costs, test-bed availability), such cyber-physical smart home datasets are quite rare. This paper aims to fill this gap by contributing a dataset we collected in a real smart home with annotated labels. This paper explains the process we followed to collect the data and how we organised them to facilitate wider use within research communities.A related article can be found at https://doi.org/10.3389/friot.2023.1275080
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
With the growing interest in Internet of Things (IoT) devices, a number of communication protocols have been developed to support a variety of IoT use cases. One promising communication paradigm that has been widely adopted in the IoT is the publish-subscribe pattern, which is supported by a number of messaging protocols such as MQTT, AMQP, and XMPP. Due to the diversity of IoT device types, an IoT application may communicate with IoT devices using a variety of messaging protocols, software frameworks, and strategies. To this extent, it becomes critical to determine the robustness of components responsible for message delivery (i.e., message brokers). We conduct a comparative study of the MQTT protocol's performance in this paper, comparing performance variables across a range of payload sizes and security levels. Preliminary results indicate that when the payload size remains small, using higher security levels does not result in significant latency overheads. Additionally, we discovered that implementing mutual authentication via Transport Layer Security (TLS) has no effect on MQTT response times in persistent connections when compared to using the default security level, which authenticates only the server.
This dataset presents the IoT network traffic generated by connected objects. In order to understand and characterise the legitimate behaviour of network traffic, a platform is created to generate IoT traffic under realistic conditions. This platform contains different IoT devices: voice assistants, smart cameras, connected printers, connected light bulbs, motion sensors, etc. Then, a set of interactions with these objects is performed to allow the generation of real traffic. This data is used to identify anomalies and intrusions using machine learning algorithms and to improve existing detection models. Our dataset is available in two formats: PCAP and csv and was created as part of the EU CEF Variot project https://variot.eu. To download the data in pcap format and for more information, our database is available on this web portal: https://www.variot.telecom-sudparis.eu/
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The data was collected using IoT sensors that transmit measurements to a local server via Wi-Fi at a frequency of 5 seconds. The system integrates temperature, humidity, light, pH, and electrical conductivity (EC) sensors connected to an ESP Arduino microcontroller. The microcontroller was programmed to transmit data via HTTP. The data received by the server is automatically saved in CSV format, broken down by days, with a separate file containing 17,280 lines (one record every 5 seconds) collected for each day during a week.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
as they are connected on a large scale with high-value data content
http://www.opendefinition.org/licenses/cc-by-sahttp://www.opendefinition.org/licenses/cc-by-sa
This dataset presents the IoT network traffic generated by connected objects. In order to understand and characterise the legitimate behaviour of network traffic, a platform is created to generate IoT traffic under realistic conditions. This platform contains different IoT devices: voice assistants, smart cameras, connected printers, connected light bulbs, motion sensors, etc. Then, a set of interactions with these objects is performed to allow the generation of real traffic. This data is used to identify anomalies and intrusions using machine learning algorithms and to improve existing detection models. Our dataset is available in two formats: pcap and csv and was created as part of the EU CEF VARIoT project https://variot.eu. To download the data in pcap format and for more information, our database is available on this web portal : https://www.variot.telecom-sudparis.eu/.
This dataset describes the install & removal dates, asset serial number and location data for smart city assets in Ballarat. The information was collected as assets are moved. The intended use of the information is to inform the public of the location of Ballarat's smart city assets.. This dataset is typically updated manually when assets are moved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the real-time dataset. This dataset is created for monitoring the real-time aquatic environment using an IoT framework. Three sensors named pH, Temperature, and turbidity along with an Arduino controller are used for monitoring the water quality of 5 ponds. It has 4 columns and 40280 rows. They are- pH, Temperature, Turbidity, and Fish. Here fish is the target variable and others are the independent variable. There are 11 fish categories, having distinct values of tilapia 8830 rui 6336 pangas 5314 silverCup 3906 katla 3786 sing 3776 shrimp 3204 karpio 2112 prawn 1348 koi 964 magur 704.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A real-world radio frequency (RF) fingerprinting dataset for commercial off-the-shelf (COTS) Bluetooth emitters under challenging testbed setups is presented in this dataset. The dataset includes emissions from 10 COTS IoT emitters (2 laptops and 8 commercial chips) that are captured with a National Instruments Ettus USRP X300 radio outfitted with a UBX160 daughterboard and a VERT2450 antenna. The receiver is tuned to record a 2 MHz bandwidth of the spectrum centered at the 2.414 GHz frequency.
By 2025, forecasts suggest that there will be more than ** billion Internet of Things (IoT) connected devices in use. This would be a nearly threefold increase from the IoT installed base in 2019. What is the Internet of Things? The IoT refers to a network of devices that are connected to the internet and can “communicate” with each other. Such devices include daily tech gadgets such as the smartphones and the wearables, smart home devices such as smart meters, as well as industrial devices like smart machines. These smart connected devices are able to gather, share, and analyze information and create actions accordingly. By 2023, global spending on IoT will reach *** trillion U.S. dollars. How does Internet of Things work? IoT devices make use of sensors and processors to collect and analyze data acquired from their environments. The data collected from the sensors will be shared by being sent to a gateway or to other IoT devices. It will then be either sent to and analyzed in the cloud or analyzed locally. By 2025, the data volume created by IoT connections is projected to reach a massive total of **** zettabytes. Privacy and security concerns Given the amount of data generated by IoT devices, it is no wonder that data privacy and security are among the major concerns with regard to IoT adoption. Once devices are connected to the Internet, they become vulnerable to possible security breaches in the form of hacking, phishing, etc. Frequent data leaks from social media raise earnest concerns about information security standards in today’s world; were the IoT to become the next new reality, serious efforts to create strict security stands need to be prioritized.