The government has surveyed UK businesses, charities and educational institutions to find out how they approach cyber security and gain insight into the cyber security issues they face. The research informs government policy on cyber security and how government works with industry to build a prosperous and resilient digital UK.
19 April 2023
Respondents were asked about their approach to cyber security and any breaches or attacks over the 12 months before the interview. Main survey interviews took place between October 2022 and January 2023. Qualitative follow up interviews took place in December 2022 and January 2023.
UK
The survey is part of the government’s National Cyber Strategy 2002.
There is a wide range of free government cyber security guidance and information for businesses, including details of free online training and support.
The survey was carried out by Ipsos UK. The report has been produced by Ipsos on behalf of the Department for Science, Innovation and Technology.
This release is published in accordance with the Code of Practice for Statistics (2018), as produced by the UK Statistics Authority. The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.
The document above contains a list of ministers and officials who have received privileged early access to this release. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.
The Lead Analyst for this release is Emma Johns. For any queries please contact cybersurveys@dsit.gov.uk.
For media enquiries only, please contact the press office on 020 7215 1000.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
On average, 37% of organisations globally were victims of a ransomware attack between January and February 2021. The top 15 countries that were affected the most were...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
1.Introduction
In the digital era of the Industrial Internet of Things (IIoT), the conventional Critical Infrastructures (CIs) are transformed into smart environments with multiple benefits, such as pervasive control, self-monitoring and self-healing. However, this evolution is characterised by several cyberthreats due to the necessary presence of insecure technologies. DNP3 is an industrial communication protocol which is widely adopted in the CIs of the US. In particular, DNP3 allows the remote communication between Industrial Control Systems (ICS) and Supervisory Control and Data Acquisition (SCADA). It can support various topologies, such as Master-Slave, Multi-Drop, Hierarchical and Multiple-Server. Initially, the architectural model of DNP3 consists of three layers: (a) Application Layer, (b) Transport Layer and (c) Data Link Layer. However, DNP3 can be now incorporated into the Transmission Control Protocol/Internet Protocol (TCP/IP) stack as an application-layer protocol. However, similarly to other industrial protocols (e.g., Modbus and IEC 60870-5-104), DNP3 is characterised by severe security issues since it does not include any authentication or authorisation mechanisms. More information about the DNP3 security issue is provided in [1-3]. This dataset contains labelled Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics (Common-Separated Values - CSV format) and DNP3 flow statistics (CSV format) related to 9 DNP3 cyberattacks. These cyberattacks are focused on DNP3 unauthorised commands and Denial of Service (DoS). The network traffic data are provided through Packet Capture (PCAP) files. Consequently, this dataset can be used to implement Artificial Intelligence (AI)-powered Intrusion Detection and Prevention (IDPS) systems that rely on Machine Learning (ML) and Deep Learning (DL) techniques.
2.Instructions
This DNP3 Intrusion Detection Dataset was implemented following the methodological frameworks of A. Gharib et al. in [4] and S. Dadkhah et al in [5], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.
A network topology consisting of (a) eight industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to implement this DNP3 Intrusion Detection Dataset. In particular, the following cyberattacks were implemented.
On Thursday, May 14, 2020, the DNP3 Disable Unsolicited Messages Attack was executed for 4 hours.
On Friday, May 15, 2020, the DNP3 Cold Restart Message Attack was executed for 4 hours.
On Friday, May 15, 2020, the DNP3 Warm Restart Message Attack was executed for 4 hours.
On Saturday, May 16, 2020, the DNP3 Enumerate Attack was executed for 4 hours.
On Saturday, May 16, 2020, the DNP3 Info Attack was executed for 4 hours.
On Monday, May 18, 2020, the DNP3 Initialisation Attack was executed for 4 hours.
On Monday, May 18, 2020, the Man In The Middle (MITM)-DoS Attack was executed for 4 hours.
On Monday, May 18, 2020, the DNP3 Replay Attack was executed for 4 hours.
On Tuesday, May 19, 2020, the DNP3 Stop Application Attack was executed for 4 hours.
The aforementioned DNP3 cyberattacks were executed, utilising penetration testing tools, such as Nmap and Scapy. For each attack, a relevant folder is provided, including the network traffic and the network flow statistics for each entity. In particular, for each cyberattack, a folder is given, providing (a) the pcap files for each entity, (b) the Transmission Control Protocol (TCP)/ Internet Protocol (IP) network flow statistics for 120 seconds in a CSV format and (c) the DNP3 flow statistics for each entity (using different timeout values in terms of second (such as 45, 60, 75, 90, 120 and 240 seconds)). The TCP/IP network flow statistics were produced by using the CICFlowMeter, while the DNP3 flow statistics were generated based on a Custom DNP3 Python Parser, taking full advantage of Scapy.
The dataset consists of the following folders:
20200514_DNP3_Disable_Unsolicited_Messages_Attack: It includes the pcap and CSV files related to the DNP3 Disable Unsolicited Message attack.
20200515_DNP3_Cold_Restart_Attack: It includes the pcap and CSV files related to the DNP3 Cold Restart attack.
20200515_DNP3_Warm_Restart_Attack: It includes the pcap and CSV files related to DNP3 Warm Restart attack.
20200516_DNP3_Enumerate: It includes the pcap and CSV files related to the DNP3 Enumerate attack.
20200516_DNP3_Ιnfo: It includes the pcap and CSV files related to the DNP3 Info attack.
20200518_DNP3_Initialize_Data_Attack: It includes the pcap and CSV files related to the DNP3 Data Initialisation attack.
20200518_DNP3_MITM_DoS: It includes the pcap and CSV files related to the DNP3 MITM-DoS attack.
20200518_DNP3_Replay_Attack: It includes the pcap and CSV files related to the DNP3 replay attack.
20200519_DNP3_Stop_Application_Attack: It includes the pcap and CSV files related to the DNP3 Stop Application attack.
Training_Testing_Balanced_CSV_Files: It includes balanced CSV files from CICFlowMeter and the Custom DNP3 Python Parser that could be utilised for training ML and DL methods. Each folder includes different sub-folder for the corresponding flow timeout values used by the DNP3 Python Custom Parser. For CICFlowMeter, only the timeout value of 120 seconds was used.
Each folder includes respective subfolders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the DNP3 network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) generated by CICFlowMeter for the timeout value of 120 seconds and finally (c) the DNP3 flow statistics (CSV file) from the Custom DNP3 Python Parser. Finally, it is noteworthy that the network flows from both CICFlowMeter and Custom DNP3 Python Parser in each CSV file are labelled based on the DNP3 cyberattacks executed for the generation of this dataset. The description of these attacks is provided in the following section, while the various features from CICFlowMeter and Custom DNP3 Python Parser are presented in Section 5.
4.Testbed & DNP3 Attacks
The following figure shows the testbed utilised for the generation of this dataset. It is composed of eight industrial entities that play the role of the DNP3 outstations/slaves, such as Remote Terminal Units (RTUs) and Intelligent Electron Devices (IEDs). Moreover, there is another workstation which plays the role of the Master station like a Master Terminal Unit (MTU). For the communication between, the DNP3 outstations/slaves and the master station, opendnp3 was used.
Table 1: DNP3 Attacks Description
DNP3 Attack
Description
Dataset Folder
DNP3 Disable Unsolicited Message Attack
This attack targets a DNP3 outstation/slave, establishing a connection with it, while acting as a master station. The false master then transmits a packet with the DNP3 Function Code 21, which requests to disable all the unsolicited messages on the target.
20200514_DNP3_Disable_Unsolicited_Messages_Attack
DNP3 Cold Restart Attack
The malicious entity acts as a master station and sends a DNP3 packet that includes the “Cold Restart” function code. When the target receives this message, it initiates a complete restart and sends back a reply with the time window before the restart process.
20200515_DNP3_Cold_Restart_Attack
DNP3 Warm Restart Attack
This attack is quite similar to the “Cold Restart Message”, but aims to trigger a partial restart, re-initiating a DNP3 service on the target outstation.
20200515_DNP3_Warm_Restart_Attack
DNP3 Enumerate Attack
This reconnaissance attack aims to discover which DNP3 services and functional codes are used by the target system.
20200516_DNP3_Enumerate
DNP3 Info Attack
This attack constitutes another reconnaissance attempt, aggregating various DNP3 diagnostic information related the DNP3 usage.
20200516_DNP3_Ιnfo
Data Initialisation Attack
This cyberattack is related to Function Code 15 (Initialize Data). It is an unauthorised access attack, which demands from the slave to re-initialise possible configurations to their initial values, thus changing potential values defined by legitimate masters
20200518_Initialize_Data_Attack
MITM-DoS Attack
In this cyberattack, the cyberattacker is placed between a DNP3 master and a DNP3 slave device, dropping all the messages coming from the DNP3 master or the DNP3 slave.
20200518_MITM_DoS
DNP3 Replay Attack
This cyberattack replays DNP3 packets coming from a legitimate DNP3 master or DNP3 slave.
20200518_DNP3_Replay_Attack
DNP3 Step Application Attack
This attack is related to the Function Code 18 (Stop Application) and demands from the slave to stop its function so that the slave cannot receive messages from the master.
20200519_DNP3_Stop_Application_Attack
The TCP/IP network flow statistics generated by CICFlowMeter are summarised below. The TCP/IP network flows and their statistics generated by CICFlowMeter are labelled based on the DNP3 attacks described above, thus allowing the training of ML/DL models. Finally, it is worth mentioning that these statistics are generated when the flow timeout value is equal with 120 seconds.
Table
In 2022, the state of Telangana in India had the highest number of reported cybercrimes compared to the rest of the country, with over ****** cases registered with the authorities. The country recorded over ****** cases of cybercrime that year, marking a significant increase compared to about ****** cases in 2016. Cybercrime in India The growing digital economy has created new opportunities for cybercriminals by introducing higher complexity or widening the scope of digital aspects in our daily lives. India is no exception, for example, the number of people arrested and charged for cybercrime across India in 2021 showed a wide spectrum of criminal charges including but not limited to blackmailing, forgery, sexual exploitation, or counterfeiting. Studies also indicated small businesses to be likely targets of such crimes. Combating cybercrime The country led in the encounter rate of cybercrimes, with **************** internet users reporting having experienced a cybercrime, compared to the world average of about four out of ten internet users in 2022. As the government pushes for a digital India, cybersecurity has become the need of the hour. Special initiatives such as the Indian Cyber Crime Coordination Centre, which helps to coordinate the efforts in combating cybercrime, as well as initiatives to raise public awareness and build institutional capacity to cope with it, have been funded by the government.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
While every industry is affected by ransomware attacks, the truth is that some industries are more susceptible than others. This is the full breakdown of the top 15 sectors most targeted by malware.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Wget provenance data in edge-list format parsed from CamFlow provenance data. This dataset contains attack wget base graph data. Experiments were run for over an hour, with recurrent wget commands issued throughout the experiments (one for every 120 seconds). Background activities were also captured as CamFlow whole-system provenance was turned on. Several malicious URL were run during each experimental session. 5 attack experiments were recorded with different normal benign wget operations mixture. Provenance data was in JSON format and converted into edge-list format for the Unicorn IDS research project. Conversation time was Sept. 26th, 2018. Each experiment consists of a base and a streaming graph component.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IEC 60870-5-104
Intrusion Detection Dataset
Readme File
ITHACA – University of Western Macedonia - https://ithaca.ece.uowm.gr/
Authors: Panagiotis Radoglou-Grammatikis, Thomas Lagkas, Vasileios Argyriou, Panagiotis Sarigiannidis
Publication Date: September 23, 2022
1.Introduction
The evolution of the Industrial Internet of Things (IIoT) introduces several benefits, such as real-time monitoring, pervasive control and self-healing. However, despite the valuable services, security and privacy issues still remain given the presence of legacy and insecure communication protocols like IEC 60870-5-104. IEC 60870-5-104 is an industrial protocol widely applied in critical infrastructures, such as the smart electrical grid and industrial healthcare systems. The IEC 60870-5-104 Intrusion Detection Dataset was implemented in the context of the research paper entitled "Modeling, Detecting, and Mitigating Threats Against Industrial Healthcare Systems: A Combined Software Defined Networking and Reinforcement Learning Approach" [1], in the context of two H2020 projects: ELECTRON: rEsilient and seLf-healed EleCTRical pOwer Nanogrid (101021936) and SDN-microSENSE: SDN - microgrid reSilient Electrical eNergy SystEm (833955). This dataset includes labelled Transmission Control Protocol (TCP)/Internet Protocol (IP) network flow statistics (Common-Separated Values (CSV) format) and IEC 60870-5-104 flow statistics (CSV format) related to twelve IEC 60870-5-104 cyberattacks. In particular, the cyberattacks are related to unauthorised commands and Denial of Service (DoS) activities against IEC 60870-5-104. Moreover, the relevant Packet Capture (PCAP) files are available. The dataset can be utilised for Artificial Intelligence (AI)-based Intrusion Detection Systems (IDS), taking full advantage of Machine Learning (ML) and Deep Learning (DL).
2.Instructions
The IEC 60870-5-104 dataset was implemented following the methodology of A. Gharib et al. in [2], including eleven features: (a) Complete Network Configuration, (b) Complete Traffic, (c) Labelled Dataset, (d) Complete Interaction, (e) Complete Capture, (f) Available Protocols, (g) Attack Diversity, (h) Heterogeneity, (i) Feature Set and (j) Metadata.
A network topology consisting of (a) seven industrial entities, (b) one Human Machine Interfaces (HMI) and (c) three cyberattackers was used to construct the IEC 60870-5-104 Intrusion Detection Dataset. The industrial entities use IEC TestServer[1], while the HMI uses Qtester104[2]. On the other hand, the cyberattackers use Kali Linux[3] equipped with Metasploit[4], OpenMUC j60870[5] and Ettercap[6]. The cyberattacks were performed during the following days.
On Saturday, April 25, 2020, a DoS cyberattack (M_SP_NA_1_DoS) was executed for 2 hours, using the M_SP_NA_1 command.
On Sunday, April 26, 2020, two cyberattacks were executed, namely (a) DoS (C_CI_NA_1_DoS) and (b) unauthorised injection (C_CI_NA_1), using the C_CI_NA_1 command for 2 hours.
On Monday, April 27, 2020, one unauthorised injection attack (C_SE_NA_1) was executed for 4 hours, using the C_SE_NA_1 command.
Tuesday, April 28, 2020 two cyberattacks were executed, namely (a) unauthorised injection (C_SC_NA_1) and (b) DoS (C_SE_NA_1_DoS), using the C_SC_NA_1 and C_SE_NA_1 commands for 2 hours and 4 hours, respectively.
Wednesday, April 29, 2020, one DoS (C_SC_NA_1) cyberattack was performed for 2 hours, using the C_SC_NA_1 command.
Friday, June 05, 2020, two cyberattacks were executed, namely (a) DoS (C_RD_NA_1_DoS) and (b) unauthorised injection (C_RD_NA_1), using the C_RD_NA_1 command for 2 and 4 hours, respectively.
Saturday, June 06, 2020, two cyberattacks were executed, namely (a) DoS (C_RP_NA_1_DoS) and (b) unauthorised injection (C_RP_NA_1), using the C_RP_NA_1 command for 2 and 4 hours, respectively.
Monday, June 08, 2020, a Man In The Middle (MITM) cyberattack was executed for 2 hours, filtering and dropping the IEC 60870-5-104 packets.
For each attack, a 7zip file is provided, including the network traffic and the network flow statistics for each entity. Moreover, a relevant diagram is provided, illustrating the corresponding cyberattack. In particular, for each entity, a folder is given, including (a) the relevant pcap file, (b) Transmission Control Protocol (TCP) / Internet Protocol (IP) network flow statistics in a Common Separated Value (CSV) format and (c) IEC 60870-5-104 flow statistics in a CSV format. The TCP/IP network flow statistics were generated by CICFlowMeter[7], while the IEC 60870-5-104 flow statistics were generated based on a Custom IEC 60870-5-104 Python Parser[8], taking full advantage of Scapy[9].
3.Dataset Structure
The dataset consists of the following files:
20200425_UOWM_IEC104_Dataset_m_sp_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the M_SP_NA_1 attack.
20200426_UOWM_IEC104_Dataset_c_ci_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_CI_NA_1_DoS attack.
20200426_UOWM_IEC104_Dataset_c_ci_na_1.7z: A 7zip file including the pcap and CSV files related to C_CI_NA_1 attack.
20200427_UOWM_IEC104_Dataset_c_se_na_1.7z: A 7zip file including the pcap and CSV files related to the C_SE_NA_1 attack.
20200428_UOWM_IEC104_Dataset_c_sc_na_1.7z: A 7zip file including the pcap and CSV files related to the C_SC_NA_1 attack.
20200428_UOWM_IEC104_Dataset_c_se_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_SE_NA_1_DoS attack.
20200429_UOWM_IEC104_Dataset_c_sc_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_SC_NA_1_DoS attack.
20200605_UOWM_IEC104_Dataset_c_rd_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_RD_NA_1_DoS attack.
20200605_UOWM_IEC104_Dataset_c_rd_na_1.7z: A 7zip file including the pcap and CSV files related to the C_RD_NA_1 attack.
20200606_UOWM_IEC104_Dataset_c_rp_na_1_DoS.7z: A 7zip file including the pcap and CSV files related to the C_RP_NA_1_DoS attack.
20200606_UOWM_IEC104_Dataset_c_rp_na_1.7z: A 7zip file including the pcap and CSV files related to the C_RP_NA_1 attack.
20200608_UOWM_IEC104_Dataset_mitm_drop.7z: A 7zip file including the pcap and CSV files related to the MITM attack.
Balanced_IEC104_Train_Test_CSV_Files.zip: This zip file includes balanced CSV files from CICFlowMeter and the Custom IEC 60870-5-104 Python Parser that could be utilised for training ML and DL methods. The zip file includes different folders for the corresponding flow timeout values used for CICFlowMeter and IEC 60870-5-104 Python Parser, respectively.
Each 7zip file includes respective folders related to the entities/devices (described in the following section) participating in each attack. In particular, for each entity/device, there is a folder including (a) the overall network traffic (pcap file) related to this entity/device during each attack, (b) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the overall network traffic, (c) the IEC 60870-5-104 network traffic (pcap file) related to this entity/device during each attack, (d) the TCP/IP network flow statistics (CSV file) from CICFlowMeter for the IEC 608770-5-104 network traffic, (e) the IEC 60870-5-104 flow statistics (CSV file) from the Custom IEC 60870-5-104 Python Parser for the IEC 608770-5-104 network traffic and finally, (f) an image showing how the attack was executed. Finally, it is noteworthy that the network flow from both CICFlowMeter and Custom IEC 60870-5-104 Python Parser in each CSV file are labelled based on the IEC 60870-5-104 cyberattacks executed for the generation of this dataset. The description of these attacks is given in the following section, while the various features from CICFlowMeter and Custom IEC 60870-5-104 Python Parser are presented in Section 5.
4.Testbed & IEC 60870-5-104 Attacks
The testbed created for generating this dataset is composed of five virtual RTU devices emulated by IEC TestServer and two real RTU devices. Moreover, there is another workstation which plays the role of Master Terminal Unit (MTU) and HMI, sending legitimate IEC 60870-5-104 commands to the corresponding RTUs. For this purpose, the workstation uses QTester104. In addition, there are three attackers that act as malicious insiders executing the following cyberattacks against the aforementioned RTUs. Finally, the network traffic data of each entity/device was captured through tshark.
Table 1: IEC 60870-5-104 Cyberattacks Description
IEC 60870-5-104 Cyberattack Description
Description
Dataset Files
MITM Drop
During this attack, the cyberattacker is placed between two endpoints, thus monitoring and dropping the network traffic exchanged.
20200608_UOWM_IEC104_Dataset_mitm_drop.7z
C_CI_NA_1
The C_CI_NA_1 is a Counter Interrogation command in the control direction. This cyberattack sends unauthorised IEC 60870-5-104 C_CI_NA_1 packets to the target system.
20200426_UOWM_IEC104_Dataset_c_ci_na_1.7z
C_SC_NA_1
The C_SC_NA_1 command is a single command. This cyberattack sends unauthorised C_SC_NA_1 60870-5-104 packets to the target system
20200428_UOWM_IEC104_Dataset_c_sc_na_1.7z
C_SE_NA_1
The C_SE_NA_1 command is a set-point command with normalised values. This cyberattack sends unauthorised IEC 60870-5-104 C_SE_NA_1 packets to the target system.
20200427_UOWM_IEC104_Dataset_c_se_na_1.7z
C_RD_NA_1
The C_RD_NA_1 command is a read command. This cyberattack sends unauthorised IEC 60870-5-104 C_RD_NA_1 packets to the target
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The datasets demonstrate the malware economy and the value chain published in our paper, Malware Finances and Operations: a Data-Driven Study of the Value Chain for Infections and Compromised Access, at the 12th International Workshop on Cyber Crime (IWCC 2023), part of the ARES Conference, published by the International Conference Proceedings Series of the ACM ICPS.
Using the well-documented scripts, it is straightforward to reproduce our findings. It takes an estimated 1 hour of human time and 3 hours of computing time to duplicate our key findings from MalwareInfectionSet; around one hour with VictimAccessSet; and minutes to replicate the price calculations using AccountAccessSet. See the included README.md files and Python scripts.
We choose to represent each victim by a single JavaScript Object Notation (JSON) data file. Data sources provide sets of victim JSON data files from which we've extracted the essential information and omitted Personally Identifiable Information (PII). We collected, curated, and modelled three datasets, which we publish under the Creative Commons Attribution 4.0 International License.
1. MalwareInfectionSet
We discover (and, to the best of our knowledge, document scientifically for the first time) that malware networks appear to dump their data collections online. We collected these infostealer malware logs available for free. We utilise 245 malware log dumps from 2019 and 2020 originating from 14 malware networks. The dataset contains 1.8 million victim files, with a dataset size of 15 GB.
2. VictimAccessSet
We demonstrate how Infostealer malware networks sell access to infected victims. Genesis Market focuses on user-friendliness and continuous supply of compromised data. Marketplace listings include everything necessary to gain access to the victim's online accounts, including passwords and usernames, but also detailed collection of information which provides a clone of the victim's browser session. Indeed, Genesis Market simplifies the import of compromised victim authentication data into a web browser session. We measure the prices on Genesis Market and how compromised device prices are determined. We crawled the website between April 2019 and May 2022, collecting the web pages offering the resources for sale. The dataset contains 0.5 million victim files, with a dataset size of 3.5 GB.
3. AccountAccessSet
The Database marketplace operates inside the anonymous Tor network. Vendors offer their goods for sale, and customers can purchase them with Bitcoins. The marketplace sells online accounts, such as PayPal and Spotify, as well as private datasets, such as driver's licence photographs and tax forms. We then collect data from Database Market, where vendors sell online credentials, and investigate similarly. To build our dataset, we crawled the website between November 2021 and June 2022, collecting the web pages offering the credentials for sale. The dataset contains 33,896 victim files, with a dataset size of 400 MB.
Credits Authors
Funding
This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under project numbers 804476 (SCARE) and 952622 (SPIRS).
Alternative links to download: AccountAccessSet, MalwareInfectionSet, and VictimAccessSet.
We have used the Internet environment: 01 Switch, 01 IP camera, 01 server for monitoring, 01 server for honeypot and no firewall. This environment is directly connected to the Internet. We installed a server, functioning as a Monitoring Environment. The network traffic was obtained via Port Mirroring on the switch to the Monitoring Environment server. We added 08 virtual machines and performed the following test with a denial of service DoS attack: 01 virtual machine from 04:00 pm to 23:55 pm on 2019-12-04 with an interval every 01 hour; 02 virtual machines from 23:55 am on 2019-12-04 to 08:50 am on 2019-12-05 with an interval every 01 hour; 04 virtual machines as of 08:55 am on 2019-12-05 to 05:25 pm on 2019-12-06 with an interval every 5 minutes; 08 virtual machines from 05:30 pm on 2019-12-06 to 23:59 on 2019-12-06 with an interval every 5 minutes; End of tests with shutdown of virtual machines at 23:59 on 2019-12-06. The results were obtained from Suricata and Telegraf collections from the TICK stack. All evidence was performed by queries via EveBox, which received data from Suricata, Grafana or graphics with information extracted from the InfluxDB (Grafana) and PostgreSQL (EveBox) databases. events.csv.gz - Suricata / Evebox collections net.csv.gz - Telegraf collections from the TICK stack netstat.csv.gz - Telegraf collections from the TICK stack For correlation purposes, use the events.csv.gz file as a basis. The key to correlation is the 'timestamp' column events.csv.gz with the 'time' column in the net.csv.gz and netstat.csv.gz files. The interval between collections, non-consecutive, was from 2019-12-04 to 2019-12-06
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These latest ransomware statistics show how much damage is caused by attacks and the emerging trends you need to be aware of.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The HAI dataset was collected from a realistic industiral control system (ICS) testbed augmented with a Hardware-In-the-Loop (HIL) simulator that emulates steam-turbine power generation and pumped-storage hydropower generation.
Click here to find out more about the HAI dataset.
Please e-mail us here if you have any questions about the dataset.
In 2017, three laboratory-scale CPS testbeds were initially launched, namely GE’s turbine testbed, Emerson’s boiler testbed, and FESTO’s modular production system (MPS) water-treatment testbed. These testbeds are related to relatively simple processes, and were operated independently of each other.
In 2018, a complex process system was built to combine the three systems using a HIL simulator, where generation of thermal power and pumped-storage hydropower was simulated. This ensured that the variables were highly coupled and correlated for a richer dataset. In addition, an open platform communications united architecture (OPC-UA) gateway was installed to facilitate data collection from heterogeneous devices.
The first version of HAI dataset, HAI 1.0, was made available on GitHub and Kaggle in February 2020. This dataset included ICS operational data from normal and anomalous situations for 38 attacks. Subsequently, a debugged version of HAI 1.0, namely HAI 20.07, was released for the HAICon 2020 competition in August 2020.
HAI 21.03 was released in 2021, and was based on a more tightly coupled HIL simulator to produce clearer attack effects with additional attacks. This version provides more quantitative information and covers a variety of operational situations, and provides better insights into the dynamic changes of the physical system.
HAI 22.04 contained more sophisticated attacks that are significantly more difficult to detect than those in the previous versions. Comparing only the baseline TaPRs of HAICon 2020 and HAICon 2021, detection difficulty in HAI 22.04 is approximately four times higher than HAI 21.03.
The testbed consists of four different processes: boiler process, turbine process, water treatement process and HIL simulation:
Water treatment Process (P3): This process includes pumping water to the upper reservoir and releasing it back into the lower reservoir. It is controlled by Siemens's S7-300 PLC.
HIL Simulation(P4): Both the boiler and turbine processes are interconnected to synchronize with the rotating speed of the virtual steam-turbine power generation model. The pump and value in the water-treatment process are controlled by the pumped-storage hydropower generation model. The dSPACE's SCALEXIO system is used for the HIL simulations and is interconnected with the real-world processes through a Siemens S7-1500 PLC and ET200 remote IO devices for data-acquisition system based on the OPC gateway.
Two major versions of HAI datasets have been released thus far. Each dataset consists of several CSV files, and each file satisfies time continuity. The quantitative summary of each version are as follows:
Note: The version numbering follows a date-based scheme, where the version number indicates the released date of the HAI dataset. HAI 20.07 is the bug-fixed version of HAI v1.0 released in February 2020.
version | Data Points (points/sec) | Normal Datset Files(interval, size) | Attack Dataset Files (interval, size, attack count) |
---|---|---|---|
HAI 22.04 | 86 | train1.csv ( 26 hours, 51 MB) train2.csv ( 56 hours, 109 MB) train3.csv (35 hours, 67 MB) train4.csv (24 hours, 46 MB) train5.csv ( 66 hours, 125 MB) train6.csv (72 hours, 137 MB)) | test1.csv (24 hours, 48 MB, 07 attacks) test2.csv (23 hours, 45 MB, 17 attacks) test3.csv (17 hours, 33 MB, 10 attacks) test4.csv (36hours, 70MB, 24 attacks) |
|HAI 21.03|78|train1.csv ( 60 hours, 100 MB)
train2.csv ( 63 hours, 116 MB)
train3.csv (229 hours, 246 MB) | test1.csv (12 hours, 22 MB, 05 attacks)
test2.csv (33 hours, 62 MB, 20 attacks)
test3.csv (30 hours, 56 MB, 08 attacks)
test4.csv (11 hours, 20MB, 05 attacks)
test5.csv (26 hours, 48MB, 12 attacks)|
|HAI 20.07
(HAI 1.0)| 59| train1.csv (86 hours, 127 MB)
train2.csv (91 hours, 98 MB) | test1.csv (81 hours, 119 MB)
test2.csv (42 hours, 62 MB)|
The time-series data in each CSV file satisfies time continuity. The first column represents the observed time as “yyyy-MM-dd hh:mm:ss,” while the rest columns provide the recorded SCADA data points. The last four columns provide data labels for whether an attack occurred or not, where the attack column was applicable to all process and the other three columns were for the corresponding control processes.
Refer to the latest technical manual for the details for each column.
time | P1_B2004 | P2_B2016 | ... | P4_HT_LD | attack | attack_P1 | ... | attack_P3 |
---|---|---|---|---|---|---|---|---|
20190926 13:00:00 | 0.09830 | 1.07370 | ... | 0 | 0 | 0 | ... | 0 |
20190926 13:00:01 | 0.09830 | 1.07410 | ... | 0 | 1 | 0 | ... | 1 |
20190926 13:00:02 | 0.09830 | 1.07380 | ... | 0 | 1 | 0 | ... | 1 |
20190926 13:00:03 | 0.09830 | 1.07360 | ... | 0 | 1 | 1 | ... | 1 |
20190926 13:00:04 | 0.09830 | 1.07430 | ... | 0 | 1 | 1 | ... | 1 |
Type git clone
, and the paste the below URL.
$ git clone https://github.com/icsdataset/hai
To unzip multiple gzip files, you can use:
$ gunzip *.gz
Use of eTaPR (Enhanced Time-series Aware Precision and Recall) metric is strongly recommended to evaluate your anomaly detection model, which provides fairness to performance comparisons with other studies. Got something to suggest? Let us know!
Here are some projects and experiments that are using or featuring the dataset in interesting ways. Got something to add? Let us know!
The related projects so far are as follows.
Variational restricted Boltzmann machines to automated anomaly detection
Research on improvement of anomaly detection performance in industrial control systems
E-sfd: Explainable sensor fault detection in the ics anomaly detection system
Stacked-autoencoder based anomaly detection with industrial control system
Improved mitigation of cyber threats in iiot for smart cities: A new-era approach and scheme
Towards building intrusion detection systems for multivariate time-series data
Revitalizing self-organizing map: Anomaly detection using forecasting error patterns
Cluster-based deep one-class classification model for anomaly detection
Measurement data intrusion detection in industrial control systems based on unsupervised learning
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are the most important ransomware statistics you need to know about the attacks, demands, payments and consequences that can occur.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The main goal of any ransomware attacker is to hold people to ransom by not releasing their data until they get paid. But is it actually a good idea to pay the ransom? Here’s what the ransomware statistics tell us about organisations that paid up.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Different types of ransomware are more common than others and more likely to affect your cybersecurity. The top 5 most common types of ransomware strains are...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are the leading causes of ransomware attacks today.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following ransomware statistics detail which industries get attacked the most and which countries are most likely to be targeted.
https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy
The data masking market, valued at $397.4 million in 2025, is experiencing robust growth, projected to expand at a compound annual growth rate (CAGR) of 8.9% from 2025 to 2033. This significant expansion is driven by increasing concerns surrounding data privacy regulations like GDPR and CCPA, coupled with the rising adoption of cloud computing and the burgeoning need for secure data sharing across diverse organizational functions. The dynamic nature of data masking solutions, offering real-time protection and adaptability to evolving security threats, further fuels market growth. Key segments contributing to this expansion include the finance sector, heavily regulated and requiring stringent data protection, and the human resources (HR) sector, where sensitive employee information demands robust security measures. The market's growth trajectory is also influenced by the increasing sophistication of cyber threats and the escalating costs associated with data breaches, prompting organizations to invest proactively in data masking technologies. Further fueling market growth is the increasing adoption of data masking across various applications beyond traditional finance and HR. Operations, legal, and even support and R&D departments are increasingly recognizing the value of data masking in protecting sensitive business information and maintaining compliance. While the market faces certain restraints, such as the complexity of implementing data masking solutions and the potential for high initial investment costs, the long-term benefits of enhanced data security and regulatory compliance significantly outweigh these challenges. Leading players like IBM, Informatica, and Oracle are continuously innovating their offerings, incorporating advanced techniques such as tokenization and pseudonymization, driving market consolidation and further stimulating growth within the data masking landscape. The geographical distribution of the market reflects a strong presence in North America, driven by stringent regulations and advanced technological adoption, with Europe and Asia-Pacific also exhibiting considerable growth potential.
http://data.europa.eu/eli/dec/2011/833/ojhttp://data.europa.eu/eli/dec/2011/833/oj
Trenutačna taksonomija prijetnji početna je verzija koja je izrađena na temelju dostupnih materijala ENISA-e. Taj je materijal upotrijebljen kao potpora za strukturiranje unutarnjeg ustroja ENISA-e za prikupljanje informacija i konsolidaciju prijetnji svrhe. Nastao je u razdoblju 2012. – 2015. Konsolidirana aksonomija prijetnje početna je verzija: ENISA 2016. planira ažurirati i proširiti ga dodatnim pojedinostima, kao što su definicije različitih navedenih prijetnji.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The government has surveyed UK businesses, charities and educational institutions to find out how they approach cyber security and gain insight into the cyber security issues they face. The research informs government policy on cyber security and how government works with industry to build a prosperous and resilient digital UK.
19 April 2023
Respondents were asked about their approach to cyber security and any breaches or attacks over the 12 months before the interview. Main survey interviews took place between October 2022 and January 2023. Qualitative follow up interviews took place in December 2022 and January 2023.
UK
The survey is part of the government’s National Cyber Strategy 2002.
There is a wide range of free government cyber security guidance and information for businesses, including details of free online training and support.
The survey was carried out by Ipsos UK. The report has been produced by Ipsos on behalf of the Department for Science, Innovation and Technology.
This release is published in accordance with the Code of Practice for Statistics (2018), as produced by the UK Statistics Authority. The UKSA has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.
The document above contains a list of ministers and officials who have received privileged early access to this release. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.
The Lead Analyst for this release is Emma Johns. For any queries please contact cybersurveys@dsit.gov.uk.
For media enquiries only, please contact the press office on 020 7215 1000.