International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 190 Using Artificial Immune System and Fuzzy Logic for Alert Correlation Mehdi Bateni1 , Ahmad Baraani1 , and Ali Akbar Ghorbani2 (Corresponding author: Mehdi Bateni) Department of Computer Engineering, University of Isfahan1 Hezar Jerib St., Isfahan, 81746-73441, Iran Faculty of Computer Science, University of New Brunswick2 550 Windsor Street, Fredericton, New Brunswick, Canada (Email: Bateni@eng.ui.ac.ir) (Received Feb. 25, 2011; accepted July 13, 2011) Abstract One of the most important challenges facing the intrusion detection systems (IDSs) is the huge number of generated alerts. A system administrator will be overwhelmed by these alerts in such a way that she/he cannot manage and use the alerts. The best-known solution is to correlate low-level alerts into a higher level attack and then produce a high-level alert for them. In this paper a new automated alert correlation approach is presented. It employs Fuzzy Logic and Artiﬁcial Immune System (AIS) to discover and learn the degree of correlation between two alerts and uses this knowledge to extract the attack scenarios. The proposed system doesn’t need vast domain knowledge or rule deﬁnition eﬀorts. To correlate each new alert with previous alerts, the system ﬁrst tries to ﬁnd the correlation probability based on its fuzzy rules. Then, if there is no matching rule with the required matching threshold, it uses the AIRS algorithm. The system is evaluated using DARPA 2000 dataset and a netForensics honeynet data. The completeness, soundness and false alert rate are calculated. The average completeness for LLDoS1.0 and LLDoS2.0, are 0.957 and 0.745 respectively. The system generates the attack graphs with an acceptable accuracy and, the computational complexity of the probability assignment algorithm is linear. Keywords: Alert correlation, artificial immune system, fuzzy logic, intrusion detection system 1 Introduction Intrusion Detection System (IDS) is a rapidly growing ﬁeld that deals with detecting and responding to malicious network traﬃc and computer misuse. Intrusion detection is the process of identifying and (possibly) responding to malicious activities targeted at computing and network resources [7]. Based on their functionality IDSs are divided into two categories, misuse detection and anomaly detection systems. Misuse detection systems use a database of known attack signatures, then compare any new activity by this database and decide about its safety status. On the other hand, anomaly detection systems use a proﬁle of normal behavior for each user or system, and compare each new activity with the normal proﬁle. Any notable changes or anomalies could be considered as a possible attack [1]. The number of false alerts for the misuse detection systems is less, but they cannot identify new attacks. On the other hand, anomaly detection systems can detect some new attacks, but the rate of false alarms for them is higher. Both types of IDSs have a more serious common problem: the huge and unmanageable number of produced alerts. In most cases the large number of low-level alerts confused the system administrator. Each alert has a little information, and if there are a large number of these alerts that contain little information then the system administrator may ignore alerts because she/he cannot handle a large number of alerts. The best known solution for this problem is to correlate alerts with each other and create higher level scenarios. Alert correlation is the process of analyzing alerts that are produced by one or more IDSs to provide a more succinct and concise high-level view of the occurring or attempted intrusions [28]. The most important goal of alert correlation is to reduce the number of alerts that the administrator should investigate manually to ﬁnd the signs of attacks. The administrator prefers to have a highlevel scenario of an attack instead of a large numbers of low-level alerts. An alert correlator usually carries out its job by removing false alerts, aggregating related alerts and prioritizing alerts. Most of the correlators use a complex knowledge base of rules that deﬁne the relationship between alerts and store metadata about the protected network. Thus it has to use some expert people to enter the proper knowl- International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 edge in the knowledge base. It is hard work and needs a deep knowledge of network and security. Also considering the changes in network conﬁguration and everyday new-appearing attacks, it has to maintain this knowledge up-to-date. It is also a hard, time-consuming and errorprone work. In this paper we propose an alert correlator, which uses a combination of predeﬁned fuzzy rules and dynamic learning-based solution. To facilitate the rule deﬁnition process a limited number of general rules are used. The number of the rules in our implementation is limited to 31 and the rules are not related to any speciﬁc network conﬁguration or any speciﬁc attack type or predeﬁned scenario. Besides the predeﬁned rules there is a learning subsystem, which uses Artiﬁcial Immune Recognition System (AIRS) algorithm. AIRS is trained using the predeﬁned fuzzy rules in order to discover and remember the correlation relationship between each two alert types. For a new alert, the system ﬁnds its correlation with the previous alerts. Firstly, it tries to ﬁnd a rule in its predeﬁned rule set with matching value higher than a tuneable threshold (rule selection threshold). If it cannot ﬁnd a proper rule, then it uses the AIRS. AIRS uses the same fuzzy rules as input to generate a collection of memory cells. The correlation system uses these memory cells to ﬁnd correlation probability. After correlating two alerts, the system stores its experiences about these two alert types and their correlation in three matrices, Alert Correlation Matrix (ACM), forward strength correlation matrix (Πf ) and backward strength correlation matrix (Πb ). The new values of these three matrices aﬀect the calculation of probability of correlation for future alerts. It is also possible to use the ACM and Πf to extract the attack scenario and to create the attack graph. The proposed system needs no deep knowledge about attacks and the protected network. Also, it is a selforganizing system with the ability to adapt to the changes in order to detect new attacks and scenarios. As mentioned before, the system uses AIRS as its learning algorithm. The algorithm is very fast and has a low computational cost, so the overall computational cost of the system depends on the correlation algorithm and not the learning and correlation probability estimation algorithm. For each new alert (ai ), the system searches all hyperalerts (groups of related alerts) for an alert (aj ) with the highest correlation probability with ai . If correlation probability between ai and aj is greater than a given threshold, then ai is inserted in the hyper-alert which contains aj . Otherwise, a new hyper-alert is created and ai is inserted in it. The rest of this paper is organized as follows. Section 2 reviews the related work. Section 3 presents our system. It illustrates the architecture of the system and provides the details of its components. Section 4 reports the result of running the system with the DARPA2000 and netForensics honeynet data. Finally, Section 5 provides the conclusion and some suggestions for future work. 2 191 Related Works As mentioned before, alert correlation has two main goals: reducing the number of alerts and increasing the relevance and abstraction level of the produced reports [28]. Commonly used techniques for alert correlation can be categorized as follows: • Fusion-based • Filter-based • Causality-based Fusion-based correlation [10, 13, 27] is based on the similarity between two alerts. It deﬁnes a function for similarity and looks for alerts that are similar. If the similarity value is more than some threshold, alerts are placed in one cluster. Filter-based approaches [14, 15, 16, 19, 20, 29] either identify the false positive and the irrelevant alert or assign a priority to each alert. For instance, an alert could be classiﬁed as irrelevant if it represents an attack against a non-existent service. Priorities are usually assigned to alerts depending on how important attacked assets are. Causality-based approaches use the logical relationships between alerts to correlate them [2, 3, 5, 18, 21, 22, 23, 24, 30]. They either use the knowledge of experts to ﬁnd related alerts or aim to infer it from the statistical or machine learning analysis. Because our work is more related to the causality-based approach, we focus on the work that uses this approach. There are several causality-based approaches that use known scenarios to ﬁnd relationships among alerts. They match the sequence of incoming events with some predeﬁned scenarios. These scenarios should be deﬁned by an attack language (e.g., LAMDBA [4], STATL [6], ADeLe [26]) or learned using machine learning techniques [5, 30]. Specifying all scenarios in advance is timeconsuming and error-prone work and needs a deep knowledge of the domain. Moreover, it has problem with the new attack pattern. Wang et al. [30] proposed a multistep attack pattern discovering method that aims at solving the problems of new attack pattern discovery and overcoming the diﬃculty in complex attack association rule deﬁnition and maintenance. They mine multi-step attack activity patterns with the attack sequential pattern mining method from history aggregated high-level alerts. Their method requires good integration of history database, which should include various multi-step attack instances. Another type of causality-based correlation systems use the rule-based correlation approach [2, 3, 18, 24]. They rely on the fact that complex attacks are usually executed in several phases or steps, where the ﬁrst step prepares for attacks executed in the later steps. Each step of the attack has its prerequisites and consequences. Thus, analyzing alerts based on the predeﬁned rules containing prerequisites and consequences of the attack steps is suﬃcient to identify related alerts. Both scenario-based and rule-based approaches rely on expert knowledge to ﬁnd related alerts and cannot han- International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 dle novel attacks. Statistical approaches [21, 22, 23, 33] analyze relationships among alerts based on their cooccurrence within a certain time period, and thus, are generally independent of the prior domain knowledge. Qin [21] presented a Bayesian correlation engine for discovering the statistical relationship between alerts. They analyze statistical patterns among aggregated alerts, with the assumption that alerts are causally related if a strong statistical correlation exists among them. The degree of relevance of alerts is evaluated by calculating the conditional probability among each pair of hyper alerts. The approach builds an attack scenario by evaluating the causal relationship between each pair of hyper alerts. Because of the large number of possible combinations between hyper alerts, the running of the system in online mode is infeasible. Ren et al. [22] presented an approach for adaptive online alert correlation. The approach incorporates two components: the oﬄine module that is responsible for retrieving relevant attack information from the previously observed alerts based on the Bayesian causality mechanism; and the online component that is based on the extracted information. It correlates raw alerts and constructs attack scenarios online. There are other works that use machine learning algorithm to estimate the correlation probability among alerts and use it in correlation time. Zhu et al. [33] used Multilayer Perceptron and Support Vector Machines to estimate the alert correlation probability, and Sadoddin et al. [23] used the frequent structure mining technique. All statistical and machine learning-based approaches do not require expert knowledge and are capable of representing unknown attacks. However, the most important drawback is their high computational cost, which makes them impractical for online computation. 192 Alert Stream Fuzzy Rule to Cell Feature Vector Generator Rules Training Cells Extended AIRS Weights Training Fuzzy Matcher Correlation Probability Memory Cells Classification Class No. Probability Mapping Correlation Probability Attack Graph Generator Attack Graph Acquired Knowledge Alert Correlation Matrix Backward Strength Matrix Forward Strength Matrix Existing Hyper Alerts Hyper Alerts Figure 1: The architecture and components of the system algorithm is a supervised learning algorithm that is able to classify the unseen data based on its previous training data. Here the training data is a set of fuzzy rules and AIRS discovers and remembers the relationships between the values of the features in the rules. The correlation probability that is produced by fuzzy rules or AIRS is stored in three matrices as the experience of the system about these two alert types. If the correlation probability is higher than a predeﬁned threshold (correlation threshold) then the incoming alert will be added to the collection of alerts, which contains the alert previously encountered. The name for this collection of related alerts is a hyperalert. If the correlation probability for two alerts is less than the correlation threshold, then a new hyper-alert is created and the new alert is inserted into it. Figure 1 3 The Proposed System illustrates the overall structure of the correlation system. The main goal of the alert correlation process is to re- The system contains 6 main components. duce the number of alerts that the system administrator • Collection of fuzzy rules; encounters and has to handle manually. It is a complex • Feature vector (cell) generator; and multifaceted problem, and there are many diﬀerent • Fuzzy rule matcher; ways to face it. In this paper we introduce a combinatory • AIS-based classiﬁer; fuzzy and AIS-based solution. • Hyper-alert generator and acquired knowledge; • Attack graph generator. 3.1 An Overview The goal of our system is to assign a correlation probability to each pair of input alerts and to use this probability for next correlation. In order to accomplish this, ﬁrstly, It creates a feature vector from each pair of input alerts, then the system searches among its rules for a rule that matches with this vector with the value higher than the rs threshold (rule selection threshold). If it ﬁnds such rule, then simply uses the probability in that rule as the correlation probability of two alerts. Otherwise, it uses AIRS algorithm to ﬁnd the correlation probability. The AIRS As mentioned before, the system uses two classiﬁers in correlation probability calculation: fuzzy rule matcher and AIS-based classiﬁer. One of the most important parameter of our system is the rule selection threshold (rs). By changing the value of rs, we can change our system from a completely static system that uses only fuzzy rules to a completely learning-based system that uses AIRS algorithm. If the value of 0 is assigned to rs, then the system uses only fuzzy rule matcher and if the value of 1 is assigned to it, then the system uses only the AIS-based classiﬁer. Although the static system may work rapidly International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 and accurately in predeﬁned cases it cannot work properly in undeﬁned situation. It is better to have a dynamic system that learns other cases from the presented one. Then the learning-based algorithm can work properly in this undeﬁned situation. By changing the value of rs from 0 to 1 we can change the balance from a pure fuzzy correlator to a fuzzy AIS-based correlator and a pure AIS-based correlator. Choosing the value of rs is an important task and may aﬀect both accuracy and performance of the system. 3.2 Feature Selection Before starting its work, the correlator needs to extract some useful information from alert stream. Suppose that a2 is the last generated alert by an IDS, and a1 is an alert that is chosen from one existing hyper-alerts to investigate its correlation with a2 . To decide about the correlation of a2 with a1 the value of some feature in two alerts should be selected to generate a feature vector (we call the feature vector also cell or antigen in some cases). Seven features are chosen for this proposes. Five of them are calculated directly from a1 and a2 , and two of them are calculated from the history of the experience of the system about alerts similar to a1 and a2 . This experience is stored in three matrices ACM, Πb and Πf . Selected features are as below. • F1 : Similarity between source IP addresses of a1 and a2 (between 0 and 1); • F2 : Similarity between destination IP addresses of a1 and a2 (between 0 and 1); • F3 : Equality of destination port of a1 and a2 (0 or 1); • F4 : Equality of destination IP address of a1 with the source IP address of a2 (0 or 1); • F5 : Backward strength correlation between alerts of type a1 and a2 (between 0 and 1); • F6 : Correlation frequency between alerts with the same type as a1 and a2 (between 0 and 1); • F7 : Freshness of a1 in the arrival time of a2 (between 0 and 1). F1 -F6 are adopted from [33]. F7 is added to the feature vector in order to add the time as an important parameter to the correlation system. Features F1 and F2 are the similarity of IP addresses for source and destination addresses. To calculate the similarity between two addresses, it should count the number of common higher order bits of two addresses and divide it by 32. The values of these features are between 0 and 1. For example, two IP addresses 192.168.10.60 and 192.168.42.25 have 18 similar higher order bits, and the similarity of 0.56. Next two features (F3 and F4 ) are about the port number of the source and destination addresses. Because the 193 destination port number of two alerts are either equal or not, the value of F3 is either 1 or 0. The value of F4 is also either 0 or 1. F4 is important, because in multistep attacks the success in one step is the precondition of starting the next step, and usually the attacker tries to compromise one host and use it to compromise the next one. Thus, equivalence of the target IP address of a1 with the source IP address of a2 may indicate a multi-step attack. F5 is backward correlation strength. It is the probability of correlation between two alerts a1 and a2 when a1 has been seen before a2 . This value is extracted from a matrix with the same name. The matrix initially is set to zero, and during the process of correlation it is updated with proper values according to the process that will be described later in Section 3.3. F6 is correlation frequency. If two types of alert frequently are correlated, then it is acceptable to say that there is a meaningful correlation between them. Subsequently if we have two choices of correlation with equal values in all other features, then it is acceptable to choose the alert with the higher value of F6 for correlation. Initially F6 is 0. During the correlation process with each correlation between a1 and a2 , the F6 ’s value for these two types of alerts is increased. F5 and F6 together can be used to improve the process of correlation, especially when the other feature values are not strong enough. Assume that two alerts of type t1 and t2 have occurred ten times before, and seven of these occurrences have led to correlation because of strong value in the other features other than F5 and F6 . If two alerts of these types occur for the 11th time with the weak values of other features, then because of the strength value of F5 and F6 , it is possible to correlate them without the high value in other features. On the other hand, after several observations of two alerts, the system learns that they are in correlation with each other even if the other features are not strong enough [33]. F7 is freshness. It is about alerts’ arrival times. When a new alert arrives, it is fresh and its freshness value is 1, which indicates a possible new attack. Over time, the level of freshness declines, and after a deﬁnable time, the freshness reaches zero. This feature is added in order to add the time as an important feature to the correlation process. It increases the correlation probability of an alert with the most recently arrived alerts. Even after the freshness reaches zero, an alert can be correlated with other alerts if its other 6 features are high. A parameter, t, is deﬁned to adjust this feature. The freshness value of a1 with respect to the arrival time of a2 can be calculated by using Equation (1) [11]. We consider t=3600 then F7 reaches zero after one hour. √ (a2 .time − a1 .time) F7 =1 − (1) t Therefore, when two alerts a1 and a2 arrive the system delivers them to the feature vector generation module. It extracts required features from alerts and from stored International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 194 matrices and creates a feature vector. Suppose for alert a1 the timestamp, source address, destination address and alert type are 4:13:20, 172.16.114.50 : 1227, 172.16.113.50 : 25 and Email Ehlo, and for a2 are 5:06:16, 172.16.113.50 : 1048, 172.16.112.50 : 21 and FTP User. Their corresponding feature vector is (F1 =.6872, F2 =.7187, F3 =0, F4 =1, F5 =.2424, F6 =1, F7 =.06). by dividing the correlation weight of ai and aj to the sum of correlation weights of ai with all alerts that happened after ai with it. It can be used to predict the correlation probability of one alert to another alert that happens after it. It will be used for generating the attack graph later. On the other hand, Πb (ai , aj ) is calculated by dividing correlation weight of ai and aj by the sum of correlation weights of all alerts that happened before aj with it. It can be used to ﬁnd the correlation probability of one alert with another alert that happened before it. As mentioned before, Πb (ai , aj ) is used as one feature (F5 ) in the process of feature vector generation. Both matrices initially are ﬁlled with zero, and the correlation process is done considering the other ﬁve features. After each correlation ACM, Πb and Πf matrices are changed. After several correlations, the contents of matrices are meaningful. These matrices play the role of some sort of memory or acquired knowledge for the correlation system. 3.3 3.4 a1 a1 a2 a3 a4 Wc(a1 ,a1 ) Wc(a1 ,a2 ) Wc(a1 ,a3 ) Wc(a1 ,a4 ) a2 Wc(a2 ,a1 ) Wc(a2 ,a2 ) Wc(a2 ,a3 ) Wc(a2 ,a4 ) a3 Wc(a3 ,a1 ) Wc(a3 ,a2 ) Wc(a3 ,a3 ) Wc(a3 ,a4 ) a4 Wc(a4 ,a1 ) Wc(a4 ,a2 ) Wc(a4 ,a3 ) Wc(a4 ,a4 ) Figure 2: An Alert Correlation Matrix Knowledge Acquiring Matrices Fuzzy Rules In this section, three matrices that are used in the correlation process are introduced: the Alert Correlation Matrix (ACM), the forward correlation strength matrix (Πf ) and the backward correlation strength matrix (Πb ). ACM contains the correlation weights between every two alerts. For example, if possible alerts in the system are a1 to a4 , then ACM is as shown in Figure 2. The ACM elements are the correlation weights of two corresponding alerts and are the sum of correlation probabilities for two alerts during the correlation process until now. It is calculated by using Equation (2) [33]. We deﬁne a limited number of fuzzy rules in order to be able to assign a correlation probability to each feature vector. These rules declare the relation between seven features (F1 -F7 ) and the class number (correlation probability). For example, one rule says that if the similarity of source and target IP addresses in two alerts are high and the target ports for both alerts are the same and the target port of the ﬁrst alert is not the same as the source port of the second alert and the frequency of previous correlation for alerts of these types are high and the backward correlation strength of these two types are high and the freshness of ﬁrst alerts in the arrival time of the n ∑ Wc(ai ,aj ) = Pi,j (k), (2) second one is high, then the class of the feature vector is 20 (means that probability of correlation is 1). The k=1 format of each rule is as following. where, Pi,j (k) is the correlation probability for ai and aj If ( F1 = V1 ) and . . . ( F7 = V7 ) Then ( Class = C) in the kth correlation of them when ai was occurred before aj . Because of this time ordering, the ACM is not The antecedent of each rule contains seven features symmetric. It encodes the temporal relationship between (F1 − F7 ) and their corresponding values (V1 − V7 ). Two two alerts. Pi,j (k) is produced by the correlation engine, features (F3 and F4 ) have crisp values (0 or 1). The value and for its calculation F5 is used from the Πb matrix. On of ﬁve other features are expressed by linguistic terms the other hand, the calculated values in the ACM are used such as high, low and medium. These linguistic terms are later for generating two strength matrices (Πb , Πf ). Sub- deﬁned by proper fuzzy sets. The consequent of a rule is a sequently, ACM is updated dynamically by correlating class number that is assigned to it. The class number is an each pair of new alerts, and the updated values cause the integer value between 1 and 20. The class number of 1 is changes in the next correlation of these two alert types. equal to the probability value of 0 and the class number of Two strength matrices’ elements are calculated by using 20 is equal to the probability value of 1. Each other class Equation (3) and Equation (4) [33]. number simply can be mapped to its probability value with the step length of 0.05. Table 1 shows some sample (3) rules of our deﬁned rules. To classify an input vector such as x with the value of (v1 , v2 , v3 , v4 , v5 , v6 , v7 ) all rules are investigated and three rules with the most compatibility Wc(ai ,aj ) (4) with the x are determined. Πfc(ai ,aj ) = ∑n k=1 Wc(ai ,ak ) To calculate the compatibility of feature vector x with Unlike correlation weights in the ACM, these two ma- rule Rj we use the average membership value of the seven trices’ values are between 0 and 1. Πf (ai , aj ) is calculated features with respect to rule Rj . It is calculated by using Wc(ai ,aj ) Πbc(ai ,aj ) = ∑n k=1 Wc(ak ,aj ) International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 Equation (5). 1∑ µ(vi , Vi ), n i=1 n Compatibility(x, Rj ) = (5) where n is the number of features, vi is the value of ith feature in x, Vi is the value of ith feature in the antecedent part of rule Rj and µ is the membership function for the fuzzy set Vi . We calculate the compatibility of x with each rule to ﬁnd the most compatible rule. If the compatibility value for the most compatible rule is more than a given rule selection threshold (rs), then the class number for x is determined by our fuzzy rule matcher. Otherwise, the system tries to calculate the correlation probability by using AIS-based classiﬁer. As mentioned before, the three most compatible rules are identiﬁed by the system. In the case of using the fuzzy rule matcher the probability value is calculated based on the class number in the consequent part of these three rules. First, we determine the class number, Ci , of x and then map it to a probability value. To calculate the class number, Ci , we consider not only the class number of three most compatible rules but also their compatibility value and their distances from each other. Algorithm 1 outlines the probability mapping function. After determining the class number, Ci , we need a mapping function in order to convert a class number to a probability value. Equation (6) is used to do this. P = Ci − 1 1 + λ 2∗λ (6) Table 1: Sample predeﬁned rules F1 Med High High Med Med 3.5 F2 Med High High Med Med F3 1 1 1 0 0 F4 0 0 0 0 0 F5 High Low Low Med Med F6 High Low Low Low Low F7 Low High Low High Low Class 16 19 18 4 3 AIRS Algorithm AIRS is a supervised-learning algorithm. It was introduced in 2001 for the ﬁrst time by Watkins [31]. A revised version of it was introduced later [32]. It is more eﬃcient than the original version, but with the same level of accuracy. We refer to this new version as AIRS in this paper. The main goal of the algorithm is to produce a population of memory cells from the training data with the ability to classify the new data. The AIRS design refers to many immune system metaphors including resource competition, clonal selection, aﬃnity maturation, memory cell retention. It also uses the resource limited artiﬁcial immune system concept. In this algorithm, the 195 Algorithm 1 Probability calculation for rule x in fuzzy classiﬁer 1: Begin 2: R1 , R2 , R3 ← The Three most compatible rules with x 3: λ ← The number of classes 4: if (R1 .cmpt-R2 .cmpt> .15) or (R1 .cmpt-R3 .cmpt > .25) then 5: return (R1 .class-1)/λ+1/(2*λ) 6: end if 7: Sort R1 , R2 , R3 to Ra , Rb , Rc based on Ri .class 8: if (Rb .class-Ra .class ≥3) and (Rc .class-Rb .class ≥3) then 9: return (R1 .class-1)/λ+1/(2*λ) 10: end if 11: if Rb .class-Ra .class ≥3 then c .cmpt,Rb .cmpt) 12: d ← min(R (Rc .cmpt+Rb .cmpt) ∗ (Rc .class − Rb .class) 13: if Rb .cmpt > Rc .cmpt then 14: C ← Rb .class + d 15: else 16: C ← Rc .class − d 17: end if 18: return (C-1)/λ+1/(2*λ) 19: end if 20: if Rc .class-Rb .class ≥3 then b .cmpt,Ra .cmpt) 21: d ← min(R (Rb .cmpt+Ra .cmpt) ∗ (Rb .class − Ra .class) 22: if Ra .cmpt > Rb .cmpt then 23: C ← Ra .class + d 24: else 25: C ← Rb .class − d 26: end if 27: return (C-1)/λ+1/(2*λ) 28: end if 29: if (Rb .class-Ra .class=2) and (Rc .class-Rb .class=2) then 30: return (R1 .class-1)/λ+1/(2*λ) 31: end if 32: return (Rb .class-1)/λ+1/(2*λ) 33: End feature vectors presented for training and test are named as antigens while the system units are called as B cells. Similar B cells are represented with Artiﬁcial Recognition Balls (ARBs) which compete with each other for a ﬁxed number of resources. The ARBs with higher aﬃnities to the training antigen improve. Each antigen in training data is presented to algorithm once and the algorithm creates a memory cell for it. The memory cells formed after the presentation of all training antigens are used to classify test antigens. The AIRS has four stages: Normalization and initialization; ARB generation; Competition for resources and nomination of candidate memory cell; and memory cell introduction [31, 32]. The mechanism to develop a candidate memory cell is as follows: 1) A training antigen is presented to all the memory International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 196 cells belonging to the same class as the antigen. The memory cell most stimulated by the antigen is cloned. The memory cell and all the recently generated clones are stored into the ARB pool. The number of clones generated depends on the aﬃnity between the memory cell and antigen, and aﬃnity in turn is determined by Euclidean distance between the feature vectors of a memory cell and a training antigen. The smaller the Euclidean distance, the higher the aﬃnity, the more is the number of clones allowed. ger value (class number) for its input antigens. In order to use the fuzzy rules as training cell for AIRS, we make a slight change in the rules that is shown in the Table 1. We replace the terms such as high and low with some appropriate real value such as 1 and 0 and make a vector of real value for each rule. This vector of real value is called antigen and is used in training process of AIRS. Table 2 shows some sample training antigens for our algorithm and Table 3 is some sample memory cells that are produced by the AIRS algorithm. These memory cells will be used later for probability calculation. We describe the 2) Next, the training antigen is presented to all the improvements in the AIRS for alert correlation in more ARBs in the ARB pool. All the ARBs are appropri- details in next three subsections. ately rewarded based on aﬃnity between the ARB and the antigen. The rewards are in the form of number of resources. After all the ARBs have been Table 2: The training cells corresponding to rules of Tarewarded, the sum of all the resources in the system ble 1 typically exceeds the maximum number allowed for F1 F2 F3 F4 F5 F6 F7 Class the system. The excess number of resources held by 0.5 0.5 1 0 1.0 1.0 0.0 16 ARBs are removed in order starting from the ARB 1.0 1.0 1 0 0.0 0.0 1.0 19 of lowest aﬃnity and moving higher until the number 1.0 1.0 1 0 0.0 0.0 0.0 18 of resources held does not exceed the number of re0.5 0.5 0 0 0.5 0.0 1.0 4 sources allowed for the system. Those ARBs, which 0.5 0.5 0 0 0.5 0.0 0.0 3 are not left with any resources, are removed from the ARB pool. The remaining ARBs are tested for their aﬃnities towards the training antigen. If the average normalized stimulation level for all instances does not meet a user deﬁned stimulation threshold, then the Table 3: Sample generated memory cells ARBs are mutated and their clones are placed back in the ARB pool. The mutation range for highly F1 F2 F3 F4 F5 F6 F7 Class stimulated ARBs is more limited than the mutation 0.55 0.99 1 0 0.38 0.62 0.40 16 range of less stimulated ARBs. (class mutation is not 0.98 1.0 1 0 0.10 0.61 0.15 19 valid). Step 2 is repeated until the aﬃnity meet the 1.0 1.0 1 0 0.50 0.07 0.85 18 stimulation threshold. 0.20 0.46 0 0 0.24 0.20 0.57 4 0.65 0.50 0 0 0.10 0.11 0.43 3 3) The most stimulated ARB is chosen as a candidate memory cell. If its aﬃnity for the training antigen is greater than that of the original memory cell selected for cloning at step 1, then the candidate memory cell is placed in the memory cell pool. If in addition 3.5.1 Weight Calculation to this the diﬀerence in aﬃnity of these two mem- In distance calculation in the AIRS algorithm the weights ory cells is smaller than a user deﬁned threshold, the of all features are equal, therefore, the stimulation of one original memory cell is removed from the pool. cell (antigen) by the other is calculated by the following These steps are repeated for each training antigen. After completion of training the test data are presented only to the memory cell pool, which is responsible for actual classiﬁcation. The class of a test antigen is determined by majority voting among the k most stimulated memory cells, where k is a user deﬁned parameter. AIRS has been applied to a wide variety of publicly available classiﬁcation benchmarks. AIRS proved to be a very good classiﬁer, thus far it has been among the ten most accurate classiﬁers known in every case to which it has been applied [12]. In order to use the AIRS for alert correlation purpose we propose some improvements to it. The main goals of these improvements are to improve the accuracy of the algorithm for our usage and to enable AIRS to produce real value (probability) instead of inte- Equation Stimulation(a1 , a2 ) = 1 − Distance(a1 , a2 ). Where Distance(a1 , a2 ) is the Euclidean distance of two cells a1 and a2 . Since the number of classes is high and the number of training data are limited, a more accurate method for computing the distance values is needed. By examining the training data, we found out that the features do not have an equal eﬀect in the calculation of the probability values. To determine the eﬀect of each feature, we use the notion of Symmetrical Uncertainty [8]. This score is a variation of the Information Gain measure. It compensates for InfoGains bias toward attributes with more values and International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 normalizes its value to the range [0,1]. Symmetrical Uncertainty is deﬁned by Equation (7). [ SU (X, Y ) = 2 × Inf ormationGain(X|Y ) H(X) + H(Y ) ] (7) where H(X) and H(Y ) are the entropies of the random variables X and Y , respectively. Before using this score, all continuous attributes should be discretized into intervals. We discretized F1 , F2 , F5 , F6 and F7 into 10 intervals. Using Symmetric Uncertainity between the Class and each feature (F1 -F7 ) in the training dataset produced the following coeﬃcients, which are used for weight estimation. W1 =0.272, W2 =0.292, W3 =0.253, W4 =0.107, W5 =0.224, W6 =0.388, W7 =0.160. The weighted Euclidean distance is calculated by using Equation (8). v u n u∑ 2 2 Distance (a1 , a2 ) = t Wi ∗ (a1 .Fi − a2 .Fi ) (8) i=1 Note that the weight estimation is done only once in the parameter discovery phase of the system. 3.5.2 Class Selection Policy The other improvement that is applied to the last step of the AIRS algorithm is to change the policy of class selection. The standard version of AIRS uses the majority vote in the KNN algorithm. Our experimental results show that, replacement of majority vote selection with the least average distances selection improves the output of AIRS in the correlation engine. This means that for identifying the proper class label of a vector of data x, it is better to choose the class label with least average distances from the x, instead of the class label with the most members. 3.5.3 Probability Mapping To map the class number to the probability value we use again Equation (6). We also consider the predecessor and successor class numbers of each class to calculate its accurate probability value. Suppose that the class label that is generated for an antigen Ag is Ci , and the average distance of Ag to Ci is di . If Ci−1 is the predecessor and Ci+1 is the successor class of Ci , then the distances of Ag with Ci−1 and Ci+1 , are di−1 and di+1 . Note that, it is possible that di−1 or di+1 do not exist because Ci−1 and Ci+1 are not necessarily one of the K nearest neighbors of Ag. By using di−1 and di+1 , Equation (6) is changed as Equation (9). P = Where, 1 Ci − 1 + +∆ λ 2∗λ 2 −di−1 2 − 1+di 2∗λ 0 ∆= 1+di 2 −di+1 2 2∗λ 197 if (di−1 < di+1 ) or (@Ci−1 ) if (di = 0) or (di−1 = di+1 ) or (@Ci−1 and @Ci+1 ) if (di+1 < di−1 ) or (@Ci+1 ) In Equation (9) the initial probability is shifted toward one of the two classes, Ci−1 or Ci+1 respectively. The value of the shift is ∆, and the direction of the shift is dependent on the value of di−1 and di+1 (toward the one with the least value). By using Equation (9), the value of probability changes continuously between 0 and 1, and it would be accurate enough. Algorithm 2 outlines the probability assignment algorithm for an input cell. Algorithm 2 Probability calculation for cell x in AIRS 1: Begin 2: n ← The number of memory cells in M C 3: λ ← The number of classes 4: for j = 1 to n do 5: dj ← Weighted Euclidean (x, M Cj ) 6: end for 7: KN N ← K Memory cells with least distances to x 8: i ← The index of memory cell with least distance to x 9: Ci ← M Ci .class // class number of M Ci 10: if di = 0 then 11: ∆←0 12: end if 13: if (∃j, M Cj ∈ KN N ) and (M Cj .class = M Ci .class − 1) then 14: di−1 ← Weighted Euclidean (x, M Cj ) 15: end if 16: if (∃k, M Ck ∈ KN N ) and (M Ck .class = M Ci .class + 1) then 17: di+1 ← Weighted Euclidean (x, M Ck ) 18: end if 19: if ((@Ci−1 ) and (@Ci+1 )) or (di−1 = di+1 ) then 20: ∆←0 21: end if 22: if (@Ci−1 ) or (di−1 > di+1 ) then 23: 24: 25: 26: 27: 28: 29: 1+d2 −d2 i i+1 ∆← 2∗λ end if if (@Ci+1 ) or (di−1 < di+1 ) then 1+d2 −d2 i i−1 ∆←− 2∗λ end if return (Ci -1)/λ+1/(2*λ)+∆ End (9) 3.6 Alert Correlation Process After acquiring the proper accuracy in the probability calculation process, it is possible to give a stream of alerts to International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 our correlator to process it. Input alerts ﬁrst go through the feature vector generator one by one, then the correlation probability for each vecotr is calculated by using fuzzy matcher. If the matching value is less than rs (rule selection threshold) then the AIRS algorithm and memory cells are used for probability calculation. Each alert probably is correlated with few previous alerts and is added to a structure called hyper-alert. Each hyper-alert contains alerts with some degree of correlation that could be placed in a possible attack scenario. When a new alert such as ai arrives, its correlation with all previous alerts in existing hyper-alerts is calculated, and the hyper-alert such as Hj that contains the maximum probability of correlation (Cmax ) with ai is identiﬁed. If the probability is higher than a minimum predeﬁned threshold, then ai is added to Hj . Otherwise, a new hyper-alert with only one alert (ai ) is created. If Hj exists, then all alerts in Hj are checked, and each alert with the correlation probability near to the (Cmax ) is correlated with ai . The value of nearness can be deﬁned in the system parameters as the correlation sensitivity. In this process each time an alert is correlated with another one, the ACM, Πb and Πf matrices are also updated. As a result, the system changes its acquired knowledge dynamically and adapts itself incrementally with the new correlation results. The result of this process is the hyper-alerts. Each hyper-alert contains a number of alerts and is considered as a possible attack scenario. 3.7 Attack Graph Generation Hyper-alert is a valuable means for presenting alerts’ relationships. However, by considering the number of generated alerts in a real system, it can be concluded that the size of the hyper-alerts increase very quickly, and it becomes very diﬃcult to extract the required information from it. Each hyper-alert contains the step by step activities of an attacker. But, an attack graph is a directed graph that shows the overall scenario of an attack, and it contains one node for each alert type. By using the attack graph, it is possible to have an overall and concise view of the attack scenario. As mentioned before, Πf matrix is generated during the correlation process, and it is used in attack graph generation. The algorithm starts with an alert that represents a particular type of attack. Then it performs a horizontal search in the ACM to ﬁnd alerts that are most likely to happen after this alert. These alerts become new starting points to search for alerts that are more likely to happen next. The process is repeated until no other alerts are found to follow any existing alerts in the graph [33]. 4 Evaluation and Results The alerts produced by Realsecure [25] on the DARPA2000 [9] dataset are employed to evaluate the accuracy of the system in the extracting two attack scenarios 198 LLDoS1 and LLDoS2. The alerts produced by Snort on the netForensics honeynet data [17] are also employed to evaluate the performance of the system. We use 31 general rules in our rule set and use their corresponding 31 training antigens for AIRS training part. Before system starting its work, the AIRS algorithm is executed and the result memory cells are stored in the system. Therefore the initial knowledge of the system consists of the fuzzy rules and memory cells. System uses this initial knowledge to correlate the input alert stream. To evaluate the accuracy of our system, we use three measures, completeness, soundness and false error rate. Completeness is deﬁned as the ratio of the correctly correlated alerts to the related alert for a scenario. Soundness is deﬁned as the ratio of the correctly correlated alerts to the total correlated alerts for a scenario and false alert rate is deﬁned as the ratio of the incorrectly correlated alerts to the related alerts for a scenario. There are so many parameters in the system. We used many diﬀerent values for each parameter in order to test the system. After ﬁnding the best value of parameters in the system we execute our system for two mentioned scenarios (LLDoS1 and LLDoS2). Each scenario is examined 10 times for each setting and the reported results are based on the average values. We change the rule selection threshold (rs) from 0 to 1 to investigate the eﬀect of each classiﬁer (fuzzy and AIS-based) in the accuracy of the system. The value of 0 for rs means that the system is working only with fuzzy rules and is not relies on the AISbased correlator, and the value of 1 for rs means that the system completely uses AIS-based correlator. By changing the value of the rs system can work from a pure fuzzy correlator to a combinatory fuzzy and AIS-based correlator and ﬁnally to a pure AIS-based correlator. Our goal here is to use as less as possible initial knowledge and gain the best accuracy. As results we use only 31 initial rules in our rule collection. Although the pure fuzzy correlator (rs=0) may work with higher number of rules it is not possible to deﬁne all situations of two alerts for correlation and to declare their correlation probability. With our limited number of rules the results of the pure fuzzy correlator is weak and we do not report them here. As the result we increase the value of rs and investigate the results. By increasing the value of rs the accuracy of the system is increased until it reach near 0.9. For rs=0.9 the system uses fuzzy rules if the matching value of a rule is more than or equal to 0.9. It is reasonable to use a rule with matching value more than 0.9, beacuse it is accurate enough and there is no need to learn anything to be able to classify this data. By increasing the value of rs from this point (0.9) we neglect the existence of matched rules and this cause to increase the execution time and to decrease the accuracy of the results. The best results are obtained by rs=0.9 to 1 in diﬀerent datasets. We report the results of rs=0.9 as fuzzy AIS-based and rs=1 as pure AIS-based correlator. We also change the value of the lymphocyte number for both scenarios and for both pure AIS-based and fuzzy AIS-based correlator from 100 International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 to 2000 and report the results. To evaluate the performance of the system two types of experiments are done. We change the number of lymphocyte from 100 to 2000 for both rs=1 and rs=0.9 and compare the execution time of the correlator. We also change the number of alerts in netForensics honeynet data from 1000 to 5000 and investigate the execution time for three values of rs: 0.1, 0.9 and 1.0. .26 .2 Sadmind_Ping .36 .37 Sadmind_Amslverify_Overflow .26 .1 4.1 199 .8 RSH Experiments with LLDoS1.0 .1 .39 .26 .3 .16 Admind .1 .99 In this experiment the produced alerts by Realsecure for Mstream_Zombie inside1 traﬃc are used. Realsecure produces 922 alerts from 22 types for this data. The LLDoS1.0 is a ﬁve stages attack. It consists of the following stages: Figure 3: The attack graph generated for LLDoS1.0 (rs=0.9) • IPsweep of the network from a remote site; • Probe of live IPs to look for the sadmind daemon; • Break-ins via the sadmind vulnerability; • Installation of the trojan mstream DDoS; • Launching the DDoS. We examine this data with rs=1 and rs=0.9 (pure AIS correlator and fuzzy AIS correlator). Both correlators are examined ten times for each diﬀerent number of lymphocytes (from 100 to 2000). Both correlator are able to extract the attack scenario almost completely. The alerts that appear in almost all extracted scenario are Sadmind Ping, Admind, Sadmind Amslverify Overflow, Rsh and Mstream Zombie. The last step of the attack is not extracted in every experiment. Its related alert(Stream DoS ) is placed in an hyper-alert with only one alert. There are also alert types such as SSH Detected, TelnetEnvAll, TelnetXdisplay and TelnetTerminaltype that apear in some runs. We consider all these alert types as false correlation. Figure 3 shows one sample extracted scenario by fuzzy AIS-based correlator (rs=0.9). The diﬀerences of the results for two values of rs is in the number of required lymphocytes to get the best results. Although the fuzzy AIS-based correlator (rs=0.9) is able to extract the scenario with less number of lymphocytes (even 100) the pure AIS-based correlator (rs=1) do the same with more number of lymphocytes (about 2000). Table 4 shows the comparison of the completeness, soundness and false error rate for two correlators with the same parameters. It shows that although the soundness and false error rate for two correlators are very close the compleness of the fuzzy-AIS-based is better than pure AIS-based. Choosing the value of rs=0.9 means that the system ﬁrst tries to ﬁnd a fuzzy rule with the matching value of 0.9 or more with the input alerts and if it cannot ﬁnd such rule it uses AIS-based correlator. The results show that there are some evidences of attack in inside1 dataset that is extractable with our general fuzzy rules. But AIS-based correlator needs to train with more number of lymphocytes to be able to extract these evidences. Figure 4 shows the completeness of both correlators with diﬀerent number of lymphocytes for LLDoS1.0. We consider the number of related alerts for scenario of LLDoS1.0 58. 4.2 Experiments with LLDoS2.0 In this experiment the produced alerts by Realsecure for inside2 dataset are used. Realsecure produces 494 alerts from 20 diﬀerent types for this data. The LLDoS2.0 is also a ﬁve stages attack. It consists of the following stages: • Probe of a public DNS server on the network, via the HINFO query; • Breakin-to the DNS server via the sadmind vulnerability; • FTP upload of mstream DDoS software and the attack script; • Initiate the attack on the other hosts of the network; • Launching the DDoS. We examine this data with rs=1 and rs=0.9 (pure AIS correlator and fuzzy AIS correlator). Both correlators are examined ten times for each diﬀerent number of lymphocytes (from 100 to 2000). Both correlators are able to extract the attack scenario almost completely. The alerts appear in almost all extracted scenario are Admind, Sadmind Amslverify Overﬂow, FTP Put and Mstream Zombie. The last step of the attack is not extracted in every experiment. Its related alert (Stream DoS) is placed in an hyper-alert with only one alert. There are also alert types such as FTP User, FTP Pass, FTP Syst, TelnetEnvAll, TelnetXdisplay and TelnetTerminaltype that apear in a few runs. We consider all these alert types as false correlation. Figure 5 shows one sample extracted scenario by fuzzy AIS-based correlator (rs=0.9). International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 200 Table 4: Accuracy comparison of pure AIS-based and Fuzzy AIS-based method for LLDoS1.0 Mean Std. Dev. rs=1 (Pure AIS) Completeness Soundness False Alert .720 .977 .022 .097 .017 .017 rs=0.9 (Fuzzy-AIS) Completeness Soundness False Alert .957 .948 .053 .013 .007 .008 Table 5: Accuracy comparison of pure AIS-based and Fuzzy AIS-based method for LLDoS2.0 Mean Std. Dev. rs=1 (Pure AIS) Completeness Soundness False Alert .750 .792 .296 .056 .082 .131 rs=0.9 (Fuzzy-AIS) Completeness Soundness False Alert .745 .82 .245 .053 .082 .119 .28 Sadmind_Amslverify_Overflow .37 .1 .17 .38 .33 Admind .2 .15 .1 FTP_Put .46 Mstream_Zombie .17 Figure 5: The attack graph generated for LLDoS2.0 (rs=0.9) Figure 4: Comparing the completeness of the correlator for rs=1 and rs=0.9 with diﬀerent number of lymphocyte for Inside1 traﬃc 4.3 Here two correlators are working almost in the same way and they work with diﬀerent number of lymphocytes almost the same. The advantages of fuzzy AIS-based correlator (rs=0.9) are its better execution time and its better average soundness and average false alert rate than the pure AIS-based correlator (rs=1.0). Table 5 shows the comparison of the completeness, soundness and false error rate for two correlators with the same parameters. The average false alert rate for both correlators is relatively high the reason is that we do not consider the telnet alerts as related alerts in this scenario. By considering the telnet alerts as related alerts the mean false alert rate decreases to 0.1 with standard deviation of 0.05 for both correlator. The results show that the accuracy of our system is comparable with some more complex correlators without the need to complex rules deﬁnition task. Figure 6 shows the soundness and false alert rate for both correlators with diﬀerent number of lymphocytes for LLDoS2.0. Experiments with netForensics Honeynet Dataset The netForensics honeynet dataset contains 35 days of traﬃc logs collected from February 25, 2005 to March 31, 2005 [17]. During this period, attackers issued several multi-step attacks to compromise the honeynet. Here, the word compromised is deﬁned as a successful attack, followed by some follow-up activities. From the honeynet owner’s point of view, the most compelling evidence of compromise was the outbound IRC communication, which implies that the intrusion succeeded, the attacker has some degree of control over the machine and that he managed to install his own software (an IRC client or Bot). The owner of the honeynet also pointed out that their victim server was ﬁrst compromised on February 26 and then continued in March. The traﬃc of the two ﬁrst days of netForensics honeynet data is employed to test the ability of our system for extracting the attack scenarios. Snort generates 3419 alerts belonging to 43 diﬀerent alert types for these two days. Results show that all 43 types of alerts in the input data are correlated with International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 4.4 Figure 6: Comparing soundness and false alert rate for rs=1 and rs=0.9 with diﬀerent number of lymphocyte for inside2 traﬃc .15 Awstats Remote Code .18 Execution Attempt .13 .1 rm Command Wget Command .18 Attempt Attempt .15 .14 .12 .11 .61 .43 BLEEDING-EDGE IRC-Nick change IRC Nick change .19 .33 .32 .24 .23 .26 Chat IRC message .21 .35 .18 BLEEDING-EDGE Policy IRC connection Figure 7: The attack graph generated for netForensics honeynet (rs=0.9) each other with diﬀerent strengths. The ACM, Πb and Πf matrices are created and the correlation information are stored in them and are used for hyper-alert generation and attack graph generation. As mentioned before, the most compelling evidence of compromise is the outbound IRC communication, which implies that the intrusion succeeded. Our extracted scenario is started by three types of alerts: WEB-ATTACKS rm command attempt, BLEEDING-EDGE EXPLOIT Awstats Remote Code Execution Attempt and WEB-ATTACKS wget command attempt. The attacker uses these remote command attempts to download and install malicious software on the target machines. Then the attacker issues IRC attacks from those compromised targets to the ﬁnal victim. Snort is produced alerts such as CHAT IRC nick change, BLEEDING-EDGE IRC-Nick change on non-std port, BLEEDING-EDGE POLICY IRC connection and CHAT IRC message for the rest of the attack, and our system correlates these alerts with alerts of the ﬁrst step of the attack scenario. Figure 7 shows the extracted attack scenario for rule selection value of 0.9. 201 Performance Analysis To evaluate the performance of our correlator, we considere its execution time with diﬀerent number of lymphocytes and with diﬀerent values of rs. Figure 8 shows the execution time with diﬀerent number of lymphocytes for LLDoS2.0. As it is expected the execution time has a direct relation with the value of rs. For rs=0.1 the time is the least and for rs=1 the time is the most. The reason is that less value for rs means the more selection of fuzzy rule matcher and bypassing the AIS-based correlator. With increasing the value of rs the possibility of ﬁnding a rule is decreased and the system uses AIRS for more times and as result the execution time is increased. Moreover, as the number of lymphocytes increases the execution time is increased very slowly. For example in Figure 8 when the number of lymphocytes increases from 100 to 2000 the execution time increases from 5 to 10 seconds for rs=1. It means that 20 times increment in lymphocyte number creates an increasing of two times for execution time. Then the eﬀect of the number of lymphocytes is low. Figure 9 shows the eﬀect of the number of alerts in the execution time for netForensics honeynet data. The number of alerts is increased from 1000 to 5000 and the execution time of the system is measured for diﬀerent values of rs. Results show that, the execution time of the algorithm is O(n2 ) ( n is the number of alerts). For example for 1000 alerts and rs=0.9 the execution time of the system is 19 seconds and with 2000 alerts the execution time is 76 seconds that is four times increasing. It seems that the execution time of the correlation algorithm with higher number of alerts is not acceptable. In this version of the system we try to show the ability of our correlation engine to correlate alerts accurately but, for using our correlator in online mode it is better to work more on the correlation process to improve its performance. One possible improvement is to deﬁne a time window for correlation. Time window can considerably decrease the execution time of the system. It is not necessary to correlate each alert with all previous alerts. It is suﬃcient to correlate it with a limited number of alerts during a time window. In a real environment, it is possible to adjust the execution time of the algorithm by an acceptable length of time window. 5 Conclusion and Future Work We use AIS and fuzzy logic as two soft computing techniques for alert correlation. Our proposed system needs only a few general rules about the relation between seven selected features. These rules are deﬁned by some linguistic terms such as low, high and medium to simplify the rule deﬁnition and make it more general. Also, we need these rules as input to AIRS algorithm. AIRS is a supervised learning algorithm and by using our initial rules, it extracts more information for correlation of previous unseen patterns and stores them in the form of memory International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 202 Figure 8: Execution time of the correlator for rs=0.1, Figure 9: Execution time of the correlator for rs=0.1, rs=0.9 and rs=1 with diﬀerent number of lymphocyte rs=0.9 and rs=1 with diﬀerent number of alerts for netForensics honeynet data for inside2 traﬃc References cells. During the correlation process we use a simple fuzzy rule matcher to ﬁnd the proper rule for each feature vector. It needs that the matching value become more than a threshold (rs). If it is not, then it uses the memory cells produced by the AIRS algorithm to ﬁnd the correlation probability. In this way, we use both the simplicity and speed of fuzzy rules and the learning ability of AIRS to correlate input alerts. Our system is examined by two traﬃc data of DARPA2000 and netForensics honeynet data and its ability to extract the attack scenario is proven. Our system is simple to run, it needs no complicated initial data. It can learn and remember the correlation between diﬀerent attack types. The rs parameter is an important parameter and makes our system more ﬂexible. It is used to balance between the static nature of predeﬁned fuzzy rules and dynamic nature of AIS-based learning system. We get the best results for both datasets with the value of 0.9 for rs. But it is possible to deﬁne more number of rules and decrease the rs value. The execution time of the system is increased gradually with the number of lymphocytes. It is more dependent to the number of alerts. The results show that the execution time of the system is in the order of n2 with the number of alerts but in fact, in a real environment it is not necessary to correlate each alert with all previous alerts. It is suﬃcient to correlate each alert with a limited number of alerts during a time window. Then it is possible to adjust the execution time of the algorithm by an acceptable time window. [1] M. R. Ahmadi, “An intrusion prediction technique based on co-evolutionary immune system for network security (coco-idp),” International Journal of Network Security, vol. 9, no. 3, pp. 290–300, 2009. [2] S. Cheung, U. Lindqvist, and M. W. Fong, “Modeling multistep cyber attacks for scenario recognition,” vol. 1, pp. 284 – 292, Apr. 2003. [3] F. Cuppens and A. Miege, “Alert correlation in a cooperative intrusion detection framework,” Security and Privacy, IEEE Symposium on Security and Privacy, p. 202, 2002. [4] F. Cuppens and R. Ortalo. “Lambda: A language to model a database for detection of attacks,”. in Recent Advances in Intrusion Detection, vol. 1907, pp. 197–216. 2000. [5] O. Dain and R. Cunningham, “Fusing a heterogeneous alert stream into scenarios,” in Proceeding of the 2001 ACM Workshop on Data Mining for Security Applications, pp. 1–13, September 2001. [6] S. T. Eckmann, G. Vigna, and R. A. Kemmerer, “Statl: An attack language for state-based intrusion detection,” Journal of Computer Security, vol. 10, no. 1-2, p. 71, 2002. [7] A. Ghorbani, W. Lu, and M. Tavallaee, Network Intrusion Detection and Prevention. Springer New York, 1th edition, 2010. [8] M. A. Hall. Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, NewZealand, Apr. 1999. [9] Laboratory ML. Darpa2000 intrusion detection scenario speciﬁc data sets. “Http://www.ll.mit.edu,”, Apr. 2012. [10] K. Julisch, “Clustering intrusion detection alarms to support root cause analysis,” ACM Transactions on Acknowledgments Information System Security, vol. 6, pp. 443–471, Nov. 2003. This work is supported by the Iranian Research Institute [11] S. Lee, B. Chung, H. Kim, Y. Lee, C. Park, and for Information and Communication Technology. The auH. Yoon, “Real-time analysis of intrusion detection thors gratefully acknowledge the anonymous reviewers for alerts via correlation,” Comput Secur, vol. 25, no. 3, their valuable comments. pp. 169 – 183, 2006. International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 [12] G. M. Lois and L. Boggess, “Artiﬁcial immune systems for classiﬁcation : Some issues,” in University of Kent at Canterbury, pp. 149–153, 2002. [13] F. Maggia, M. Matteucci, and S. Zanero, “Reducing false positives in anomaly detectors through fuzzy alert aggregation,” Information Fusion, vol. 10, no. 4, pp. 300 – 311, 2009. [14] S. Manganaris, M. Christensen, D. Zerkle, and K. Hermiz, “A data mining analysis of rtid alarms,” Comput Networks, vol. 34, no. 4, pp. 571 – 577, 2000. Recent Advances in Intrusion Detection Systems. [15] B. Morin, L. Mé, H. Debar, and M. Ducass, “A logicbased model to support alert correlation in intrusion detection,” Information Fusion, vol. 10, no. 4, pp. 285 – 299, 2009. Special Issue on Information Fusion in Computer Security. [16] B. Morin, L. Mé, H. Debar, and M. Ducassé, “M2d2: a formal data model for ids alert correlation,” in Proceedings of the 5th international conference on Recent advances in intrusion detection, pp. 115–137, Berlin, Heidelberg, 2002. Springer-Verlag. [17] netForensics honeynet team. honeynet traﬃc logs. “Http://old.honeynet.org/scans/scan34,”. [18] P. Ning, Y. Cui, and D. S. Reeves, “Constructing attack scenarios through correlation of intrusion alerts,” in Proceedings of the 9th ACM conference on Computer and communications security, pp. 245– 254, New York, NY, USA, 2002. ACM. [19] T. Pietraszek. “Using adaptive alert classiﬁcation to reduce false positives in intrusion detection,”. in Recent Advances in Intrusion Detection, vol. LNCS 3224, pp. 102–124. Springer-Verlag, 2004. [20] P. A. Porras, M. W. Fong, and A. Valdes, “A missionimpact-based approach to infosec alarm correlation,” in Proceedings of the 5th international conference on Recent advances in intrusion detection, pp. 95–114, Berlin, Heidelberg, 2002. Springer-Verlag. [21] X. Qin. A Probabilistic-Based Framework for INFOSEC Alert Correlation. PhD thesis, Georgia Institute of Technology, August 2005. [22] H. Ren, N. Stakhanova, and A. Ghorbani. “An online adaptive approach to alert correlation,”. in Detection of Intrusions and Malware, and Vulnerability Assessment, vol. LNCS 6201, pp. 153–172. Springer-Verlag, 2010. [23] R. Sadoddin and A. A. Ghorbani, “An incremental frequent structure mining framework for real-time alert correlation,” Computer Security, vol. 28, no. 34, pp. 153 – 173, 2009. [24] S. J. Templeton and K. Levitt, “A requires/provides model for computer attacks,” in Proceedings of the 2000 workshop on New security paradigms, pp. 31– 38, New York, NY, USA, 2000. [25] Lab NCSUCD. Tiaa: A toolkit for intrusion alert analysis. “Http://discovery.csc.ncsu.edu/software/ correlator/ ver0.4/index.html,”. [26] E. Totel, B. Vivinis, and L. Mé. “A language driven intrusion detection system for event and alert correlation,”. in Security and Protection in Information [27] [28] [29] [30] [31] [32] [33] 203 Processing Systems, vol. 147 of IFIP International Federation for Information Processing, pp. 208–224. Springer Boston, 2004. A. Valdes and K. Skinner, “Probabilistic alert correlation,” in Recent Advances in Intrusion Detection, pp. 54–68, October 2001. F. Valeur, G. Vigna, C. Kruegel, and R. A. Kemmerer, “A comprehensive approach to intrusion detection alert correlation,” IEEE Transactions on Dependable and Secure Computing, vol. 1, pp. 146–169, 2004. J. Viinikka, H. Debar, L. Mé, A. Lehikoinen, and M. Tarvainen, “Processing intrusion detection alert aggregates with time series modeling,” Information Fusion, vol. 10, no. 4, pp. 312 – 324, 2009. Special Issue on Information Fusion in Computer Security. L. Wang, A. Ghorbani, and Y. Li, “Automatic multistep attack pattern discovering,” Int J Netw Secur, vol. 10, no. 2, pp. 142–152, 2010. A. Watkins. “Airs: A resource limited artiﬁcial immune classiﬁer,”. Master’s thesis, Mississippi State University, 2001. A. Watkins, J. Timmis, and L. Boggess, “Artiﬁcial immune recognition system (airs): An immuneinspired supervised learning algorithm,” Genetic Programming and Evolvable Machines, vol. 5, pp. 291–317, 2004. B. Zhu and A. Ghorbani, “Alert correlation for extracting attack strategies,” International Journal of Network Security, vol. 3, no. 3, pp. 244–258, 2006. Mehdi Bateni received his B. Sc. in Computer Engineering in 1997 from University of Isfahan, Isfahan, Iran and his M. Sc. in Computer Engineering from Ferdowsi University of Mashhad, Mashhad, Iran in 2000. He is currently a Ph.D. student in Department of Computer Engineering at the University of Isfahan. Ahmad Baraani is an associate professor of computer engineering at the Faculty of Engineering of the University of Isfahan (UI). He got his BS in Statistics and Computing in 1977. He got his MS and PhD degrees in Computer Science from George Washington University in 1979 and University of Wollongong in 1996, respectively. He was Head of the Research Department of the Communication systems and Information Security (CSIS) and Head of the ACM International Collegiate Programming Contest (ACM/ICPC) of University of Isfahan from 2000 until 2008. He has published more than 70 papers and He coauthored three books in Persian and received an award of ”the Best e-Commerce Iranian Journal Paper” (2005). Currently, he is teaching PhD and MS courses of Advance Topics in Database, Data Protection, Advance Databases, and Machining Learning. His research interests lie in Databases, Data security, Information Systems, e-Society, e-Commerce, Security in e-Commerce, and Security in e-Society. International Journal of Network Security, Vol.15, No.3, PP.190-204, May 2013 Ali Akbar Ghorbani currently serves as Dean of the Faculty of Computer Science. His current research focus is Web Intelligence, Network and Information Security, Complex Adaptive Systems, and Critical Infrastructure Protection. He authored more than 240 reports and research papers in journals and conference proceedings and has edited 8 volumes. He served as General Chair and Program Chair/co-Chair for 8 International Conferences and 10 International Workshops. He is the co-inventor of 3 patents in the area of Web Intelligence and Network Security. He has supervised more than 120 research associates, postdoctoral fellows, and undergraduate and graduate students. Dr. Ghorbani is the founding Director of Information Security Centre of Excellence at UNB. He is also the coordinator of the Privacy, Security and Trust (PST) network and PST annual conferences. Dr. Ghorbani is the co-Editor-In-Chief of Computational Intelligence, an international journal, and associate editor of the International Journal of Information Technology and Web Engineering and the ISC journal of Information Security. 204

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising