Anomaly Detection for Simulated IEC-60870-5-104 Traffic

The aim of Substation Security project is detection of the anomalies in energy distribution networks.

Motivation: During the past years, there has been an increase in the number of attacks on automation systems. In spite of that, there has not been enough focus dedicated to the protection of such networks. In this project is introduced a novel machine learning based intrusion detection system targeted at automation networks of substations based on the IEC 60780-5-104 protocol. The novelty of the proposed approach opposed to the state-of-the-art is the monitoring of several features on multiple protocol layers, which enables the identification of multiple types of attacks.

Suggested Approach:

  1. Simulate the communication between the substation slave and the server based on data gained from real substations and we simulate the systems behavior under attack, too.
  2. Observe the system’s normal behavior and its behavior under the attack, in order to extract features needed for building an anomaly detection system.
  3. Based on these features we suggest an anomaly detection system for the asynchronous IEC 60870-5-104 protocol.
  4. Design the anomaly detection model by using machine learning from the IEC 60870-5-104 protocol data acquired.
  5. The classifier with the highest performance was chosen by comparing 7 different classification algorithms: the Rule Learner classifier algorithm turned out to be the best.

IEC 104 Protocol

IEC 60870 is a standard for interoperability among compatible telecontrol items. The standard specification for IEC 60870-5-104 protocol is a combination of the application layer of IEC 60870-5-101 and the TCP/IP protocol. IEC 60870-5-101 is a standard for electric power systems with features like supporting balanced and unbalanced modes of data transfer, classification of data into high and low priority, cyclic and spontaneous data etc.

Data Generation

The generation of traffic is based on a simulation tool for bi-directional IEC-60870-5-104 communication. The traffic simulation is done by using real world data packets. The figure shows the working principle of the tool:

On one side of the communication we have a client and on the other side we have a server/router. When the communication between the client (e.g. an intelligent electronic device on the process layer) and the sever is established, we use wireshark as a network analysis tool for capturing all the 104 packets and their features (like frame number, time stamp, delta time between packets, round trip times, etc.)

Attack Modelling

In this project three attacks were used.

  1. First of all, we consider an attack model limited to passive eavesdropping and information gathering. The attackers goal is to gather as much information as possible by observing the network traffic and trying to map the structure of the substation for subsequent active attacks.
  2. A second attack we consider, tries to manipulate the network traffic by replacing legitimate packets with forged ones (here the attacker must know both, the 104 protocol and the structure of the substation). The goal of this is to feed the industrial control system with false measurement data, to issue arbitrary commands to the IEDs or to desynchronize system time.
  3. The third attack model is a plain DoS attack aiming at the disruption of network functionality.

The three above attacks have been chosen by referring to multiply state of the art papers which used similar attacks to verify the functionality of their Intrusion Detection Systems.

Machine Learning Approach

Intrusion Detection Systems (IDS) are systems that monitor inbound and outbound network traffic in order to detect abnormal activities. The working strategy of an IDS is to collect data from the network in order to detect suspicious activities and anomalies in the network.

In this project, a machine learning approach is followed to build an IDS customized for IEC 60870-5-104 traffic. This scheme is illustrated in Figure below:

  1. Traffic generation phase: This step involved the gathering of learning data which we will use to generate actionable knowledge. The data is combined into a single csv file format.
  2. Data exploration and preparation: During the preprocessing phase the PCAP raw data was reduced by keeping only the useful features for our case. With filtering operations we deleted all unnecessary packets like TCP, UDP, ARP etc. and concentrated only on 104 protocol ASDU packets. As already presented in the section above, our data consists of normal and anomalous traffic.
  3. Feature Extraction: From pcapng packets, 9 features were extracted as follows: Length (frame length on wire), Total_Len (total length of packet), Window size value (window size value from the TCP header), DeltaTime_prev_capt_frame (time difference from previous capture), Frame_Length_on_wire (number of bytes sent/received by the adapter), RTT (round trip time), Value (normalized value), Month (month when packets are sent/received), Packets_s (number of packets sent per second).
  4. Labeling: As already mentioned, the status attribute is divided into two classes: target and outlier. Target represents packets from normal traffic and outlier represents packets from attack traffic.
  5. Feature selection: There are 903 instances in our dataset. We have selected 9 features which have the best correlation with the status (label). Automatic feature selection methods were used for features selection like CorrelationAttributeEval which finds the correlation rate of each of the features with the class (target or outlier). Moreover, reducing the number of features has the advantage of the better performance in our predictive model.
  6. Classification Model: Machine learning algorithms are categorized into groups according to their purpose. Predictive models aim at predicting a value called target feature based on other features and their values in the dataset. Supervised learning ensures that the target features (called class) provides a way for the learner to learn the intended task. Categorical features are divided into levels. Classification predicts in which category the instance belongs.

Implementation and Performance Evaluation

After building the classifier it is time to evaluate its performance which we did by a cross validation approach. Moreover, when analyzing data misclassification errors must be taken into account. Cross-validation is the standard way of evaluating the performance of machine learning algorithms. It defines a fixed number of folds of the data. Let us suppose the number of folds is ten. Furthermore, our entire dataset is split into ten partitions. Each of these partitions is used for testing (1/10) and the remainder for training (9/10). This procedure is repeated 10 times; so every partition is used for testing once. Table below represents the result after the cross validation 10 folds was used, where Acc stands for accuracy, CC stands for correctly classified instances and IC for in correctly classified instances:

Taking into account the TPR, FPR, precision, F-Measure and ROC area we can conclude that the best performance is achieved by decision trees and rule learners. In a nutshell, it can be concluded that in our case the Decision Table classifier is the best machine learning approach, because it has the highest accuracy, TPR, FPR, precision, F-Measure and ROC area. To the best knowledge of the authors, these results are the first obtained for a behaviourbased IDS for 104 protocol data acquired from a real substation.

Conclusion

A solution based on machine learning for detecting anomalies in electrical substations was proposed. Firstly, data collection was done by using a replay tool; then three different attacks were performed (passive eavesdropping, reprogramming, DoS). The Weka tool was used to compare the accuracy and effectiveness of different classifiers. We used a dataset with 903 examples/instances with 183 target instances and 720 outlier instances. The following classi- fiers were used: NaiveBayes, IBk, J48, RandomForest, RandomTree, Decision Table and OneR. The results were analyzed based on accuracy, F1 measure and ROC value in order to find the best approach for our problem. The Decision Table ended up to be the best solution for our problem with an accuracy of 91.69%. Future work aims at improving classification accuracy by using larger data sets and refined feature selection.

 

Leave a Reply

Your email address will not be published. Required fields are marked *