2. BACKGROUND
2.5 Attack identifiers and attack types
The association between the extracted real attacks and the Nessus plug-in(s) is one of our research goals. Newly found attacks can be possibly added into the Nessus database. The CVE (http://cve.mitre.or/) attack identifier here is used to make association between the extracted real attacks and the Nessus plug-in(s). CVE (Common Vulnerabilities and Exposures) is a list of security vulnerabilities and exposures that provides common names for publicly known problems. Its goal is to make it easier to share data across separate vulnerability capabilities (tools, repositories, and services) with this "common enumeration". The organization of CVE studies new attacks and defines the attack identifiers and descriptions. The Nessus traffic has the CVE attack identifiers and the IDP products often have, too. The CVE attack identifier
can be used to check whether an attack belongs to the Nessus plug-in database or not.
This work collects 83 attacks as the samples for the extraction system. These attacks can be divided into three types according to the number of attackers and the number of connections per attack, as presented in Table 1. We assume only one target is in each attack. An attack of the first type involves one attacker and a single connection.
An example is the MySQL Authentication Bypass Exploit. This attack can login in a MySQL database without the password. An attack of the second type involves one attacker and more than one connection. An example is the Blaster worm, which establishes three connections when it tries to attack a target. An attack of the third type involves multiple attackers and a single connection from each attacker. A DDoS attack belongs to this type. This observation is helpful to build an extraction system [3], [4].
Table 1. Three types of attack definitions
Number of attackers Number of connections per attack
Example
1 1 MySQL Authentication Bypass Exploit
1 N Blaster worm
N 1 DDoS
8
Chapter 3
The Session Extraction System
The proposed session extraction system includes logs and traffic analysis. This system has two parts. The first part is an algorithm to extract sessions from the attack traffic. The second part is the analysis of the difference between Nessus traffic and the real attacks.
3.1 Overview
There are two goals in this work. First, because the partial attacks emulated by Nessus can only indicate possible security breaches but does not know whether a real attack will harm or not, this work proposes to extract the complete episode of attacks to make sure the system vulnerability. For this goal, the real traffic is recorded and then replayed to the IDP products to extract the complete episode of attacks. Trivially, the logs of the IDP products can help this work to find out the connection of the detected attacks. However, the attack may have the multiple connections. All related connections of the attack can not be all extracted by the logs of the IDP products because the IDP products only alarm and log the most important connection. Therefore, this work proposes an algorithm to extract multi connections from the attack traffic. We named the multi connections extraction algorithm as the session extraction algorithm.
The new attacks that Nessus does not have can be inserted into Nessus is the second goal of this work. Also, this work can add or replace the attacks that Nessus plug-in database already has, depending on whether a complete episode is needed.
Therefore, the association between the extracted real attacks from IDP and the emulated attacks from Nessus is needed to understand. This work proposes a method to analyze the difference between Nessus traffic and the real attacks extracted.
3.2 Extract attack sessions from recorded traffic
One goal of this work is to extract a complete episode of attacks from a large amount of traffic. The session extraction algorithm is a three-pass algorithm designed for this goal by associating packets, connections and sessions to extract attack sessions.
Before the description of the session extraction algorithm, Table 2 shows the definition of the components in session extraction algorithm. The algorithm consists of five steps as follows. Step (i), (ii), (iii) and (v) are trivial works while the step (iv) is the essence of this work.
Table 2.The definition of the components in session extraction algorithm
Names Descriptions
Sip Source IP address
Sport Source Port number
Dip Distance IP address
Dport Distance Port number
Udp
Tcp/ The TCP packet or UDP flag
Payload The content of the packet
P A TCP or UDP packet in the IP network.
) (Pi
Tuple The five-tuple of a packet
A
The anchor packet of the attackPDA(Possible DoS Attacks) The data structure that store the packets could be the DoS attacks
PNDA(Possible Not DoS attacks) The data structure that store the packets could be not the DoS attacks (i) Replay real traffic to IDP products by Tcpreplay.
This algorithm uses the domain knowledge of IDP products, including the well-known Open Source tool, Snort [5]. A IDP product illustrate what attacks have happened with its logs.
10
(ii) Find out anchor packets by the first-pass scan.
This step finds out anchor packets, the critical packets that IDP products alarm when receiving them. There are two tables used herein. One is the alarm log table , which records the alarms of attacks from the replay of attack traffic. The other is the replay log table, which records the time when Tcpreplay replays each packet.( The timestamps from the replay log table are used to mark the attack types by looking for the relation from the alarm log table. The replay log table is then compared with the alarm log table to identify the attack packets.)
Time synchronization could be a problem between the replay system and the IDP products. Even if the time has been synchronized, IDP products may not log the times accurately. Therefore, the five-tuple information is used herein. Many IDP products also log the five-tuple information of an attack (some may record fewer than five tuples).
The five-tuple information and the timestamp from the alarm log table and the replay log table can locate the anchor packets in the real traffic.
(iii) Find out the association among attack packets within the same connection by the second-pass scan.
This step discovers the anchor connection by looking for the relation of the recorded packets with the anchor packets. If the packets have common five tuples with the anchor packet, they belong to the same connection.
(iv) Find out the association among attack connections within the same session by the third-pass scan.
The attack connections can be associated with their session. The association may be difficult since the relation among the connections is obscure. Because the attacks have more than one connection, only five tuples and timestamp are insufficient to find out the other connections. The obscurest relation among the connections is the attack of multiple attackers and a single connection from each attacker because the five tuples of
the packets from these attackers are different. A common attack of this type is the DDoS or DoS attack. These two types of attacks overwhelm a server to deny its capability of providing services. From our observation, such an attack often has only the TCP ACK or SYN message, as well as a number of packets with the same data payload. The session extraction algorithm is designed based on the above observation.
The algorithm parses the recorded traffic packet by packet and extracts an attack session by analyzing the attack types.
After anchor packets of an attack have been found, the algorithm checks each following packet to see if its source IP address or destination IP address is identical to the target IP address of the anchor packet. If not, the packet will be classified to other type of attacks. If the packet belongs to this attack, the algorithm will compare each packet’s payload for similarity. The algorithm duplicates a copy in the possible DDoS attack buffer and increases the packet count by one if the similarity is high. The similarity is defined according to the longest common subsequence (LCS) of two packet payloads [6]. Formally, given a sequence X
x1,x2,...,xm
, another sequence
i i ik
Z 1, 2,..., is a subsequence of X if there exists a strictly increasing sequence
i1,i2,...,ik
of indices of X. given two sequences X and Y, we say that a sequence Z is acommon subsequence of X and Y if Z is a subsequence of both X and Y. The longest common subsequence is the longest subsequence of the all common subsequence.
Consider the payloads of two packets as two sequences of bytes, S1and S2. The LCS of S1and S2, LCS (S1, S2), is the longest sequence of bytes that are subsequences of S1and S2. The similarity is defined by the equation
2 LCS( ) *100%The similarity threshold is 80% in the proposed algorithm because the packets we collected in the DDoS or DoS attacks are often the minimum Ethernet packets of 64
12
bytes. Excluding 14-byte MAC header, 20-bytes IP header, 20-bytes TCP header and 4-byte checksum, the payload is only 6 bytes long. From our observation, the packet payloads of the DDoS or DoS attacks we collected are often the same, and the difference is only one byte if the payloads are different. The similarity in this case is 83.33%, so the similarity threshold is set to 80%.
After identifying similar packets, the session extraction algorithm watches the source IP address and the destination IP address at the same time. The step keeps only the packets that come from the attacker and go to the target and those in the opposite direction. The others are simply dropped. This step intends to distinguish the attacks that possibly have one attacker from those that are possibly DDoS attacks.
The algorithm continues to watch the next packet until the end. The algorithm returns the packet count in the possible DDoS attack buffer. The attack might be a DDoS attack if the count is larger than 200, and might be a 1-1 attack otherwise. Figure 2 shows the flowchart of the algorithm.
The algorithm can be written as some formulas and pseudo code as follows. We defined the packetP is the set of five-tuple and payload. TheTuple(Pi)is the five-tuple of the packet i, i1. The anchor packetA is the set of the five-tuple and payload that the IDP products make alarm when they receive it.
}
Therefore, the session extraction problem turns into a problem to find out the set of packets that have the high similarity of payload with anchor packetA or the same source IP address and distance IP address with anchor packetA. Assume the x is the sequence number of anchor packet in the all packets. The session extraction algorithm can be described as follow.
The pseudo code of the session extraction algorithm
(v) Replay the extracted attack session to IDP products to verify whether the same logs are generated. If it is true, the extraction is valid.
Finally, we replay the extracted attack sessions to IDP products to verify the correctness of the extraction. The extracted session must cause the same alarms as the whole traffic was replayed to the same IDP product. If an IDP product cannot find the attack, the extraction is invalid.
3.3 Analysis of the difference between Nessus traffic and real attacks
The second part is to compare the extracted attack sessions with the attacks from the Nessus plug-in database. The comparison depends on replaying both real traffic and
14
Nessus to IDP products in the following five steps.
(i) Use Tcpdump to record traffic at the backbone network.
(ii)Replay the real traffic to each IDP product to obtain a set of attacks,{Ci |i1,,N}, assuming we have N IDP products.
(iii) Replay the Nessus Plug-in traffic to each IDP product to generate another set of attacks,{Ai |i1,,N}.
Figure 2. The flowchart of the session extraction algorithm
Some IDP products mark the detected Nessus traffic as “Nessus attack”, but the others do not. The mark is confusing because an attack may be marked different in
different IDP products. We solve this problem with the CVE identifier of Nessus traffic.
Each Nessus attack has its own CVE identifier, and the alarm log also marks the attacks with the identifier. An attack can be identified from the CVE identifier if the mark is confusing.
(iv) For every attack record in C-A, where
ni
are not in the Nessus Plug-in database, exercise the extraction algorithm to extract its attack session.
(v) Those new attack sessions can be added into the Nessus plug-in database.
This work can also add or replace the attacks that already exist in Nessus plug-in database. Nessus is a convenient tool for updating its database. Because a complete episode of real attacks is necessary to detect the real vulnerability of a system, both the complete episode and the partial attack of an attack can be used arbitrarily for the system vulnerability assessment, depending on whether a complete episode is needed.
3.4 The example of the session extraction system
Fig. 3 presents the alarm log table that records the alarms of attacks during the replay of attack traffic. We replay traffic from 13:23:52 to 13:26:12 to an IDP product, and the IDP generates two alarms (buffer overflow and LSASS) into the log table.
Second, the anchor packet can be found by logs of the IDP product. As the Figure 3 shows, the anchor packet can be the 5th, 6th or 7th packet and we choose the 6th. If the five-tuple of the packets are the same, the chosen packet has no difference. Third, the five-tuple of anchor packet can be known. Fourth, each packet will be checked the payload similarity and the IP address comparison by session extraction algorithm. If the 5th and 7th packets have different five-tuple, the session extraction algorithm will find them. After the session extraction algorithm, the result is a recorded attack file. In this example, the recorded attack file is the 5th, 6th and 7th packets. Finally, the recorded
16
attack file can be replay to IDP product. The extracted session must cause the same alarms as the whole traffic was replayed to the same IDP product. If an IDP product cannot find the extracted attack, the extraction is invalid.
Figure 3. Replay traffic to the IDP products and mark attack packets in the alarm log table 1
Chapter 4
Evaluation and Discussion
We used Tcpdump on a Linux PC to collect and record all traffic from the other 50 PCs in our lab during the period from 10/23/2005 to 10/29/2005. The 50 PCs belong to the same subnet and have the same gateway to the outside. Therefore, we recorded the traffic by mirroring all traffic through that gateway.
Three evaluations are designed for this work. First, the number of real attacks that could be extracted by the session extraction system is evaluated. The evaluation also looks for the mapping of the real attacks to the Nessus plug-in traffic. We use Tcpreplay to replay the recorded traffic to the IDP product, Snort in this evaluation. The CVE identifiers and Nessus identifiers are also used for the mapping.
Second, the completeness and purity of the extracted attacks are evaluated. This evaluation also answers the variability of the session extraction system. For this evaluation, we replay the recorded traffic and also play the Nessus traffic at the same time.
Finally, the differences between Nessus traffic and the real attacks are evaluated.
For this evaluation, we chose a Nessus attack and a real attack with the same CVE identifier, and observe their differences with Ethereal.
4.1 The result of session extraction
Table 3 presents the 15 attacks detected by the IDP product (Snort) from the recorded traffic. They are ordered by the detection time. These attacks all have CVE identifiers when they are detected by Snort.
4.1.1 The occurring percentage of the extracted result
Fig. 4 shows the occurring percentage of each attack in the recorded traffic. We can see the famous worm of Worm.Win32.Slammer has high percentage. The Slammer
18
worm would be the highest occurring rate of the world during the 2005 to 2006. As Figure 4 shows, the top 6 attacks had 74% of total frequency and the sum of the fewest 3 attacks only 3% of total. The top 1 attack has 16% of total. This result told us the common attacks are the most part of the total attacks.
Table 3. The 15 attacks detected by the IDP product, Snort
Types Name Nessus id CVE id
DoS Apache.CGI.Byterange.Request.DoS no CVE-2005-2728
Worm Worm.Win32.Slammer 11214 CVE-2002-0649
Windows (1-1) IE.File.Download.Ext.Spoof no CVE-2002-0875 Dos(N-1) Smtp service logging NULL session 11308 CVE-2002-0054 SMTP(1-1) smtp_decoder: unknown_cmd 10885 CVE-2002-0055
RPC(1-1) RPC portmapper 10223 CVE-1999-0632
SMTP(1-1) Smtp unauthorized user 10703 CVE-2001-0504
General(1-1) IIS server HTR ISAPI filter mapped 10932 CAN-2002-0071 Dos(N-1) DoS: FrontPage server Extensions 11311 CAN-2002-0692 Backdoors(1-1) Sasser Virus Detection 12219 CAN-2003-0533
General (1-1) Netbios-ssn 10394 CVE-2000-0222
General(1-1) ICMP timestamp request 10114 CAN-1999-0524
General(1-1) Remote OS guess 11268 CAN-1999-0454
FTP(1-1) ftp: Overflow.CWD no CAN-2002-0126
DNS(1-1) FTP Serv-U 2.5e Dos 10488 CVE-200-0837
Figure 4 the frequency of the 15 attacks
Figure 5 the file sizes of the 15 attacks 4.1.2 The file size of the extracted result
The definition of size is the file size of the extracted attack including all packet headers and payloads. Figure 5 shows the file size of extracted attacks by session extraction system. The x-axis is the 15 extracted attacks and order by the occurring frequency. The largest size of those attacks is 9731 Kbytes. The smallest size of those attacks is 491 Kbytes. However, the session extraction system just only excludes the
“must not”packets of the attack from real traffic and reserves the possible packets of
20
the attack at the same session form the real traffic. Therefore, the extracted attacks of the same type maybe have different sizes at different dates. For example, a normal HTTP request maybe extracted with HTTP attacks when the attacks and the request happen at the same time between attacker and target. For this reason, we choose the attack size equal to the size that the most times in our experiment. If the attacks are the DDoS or DoS attacks we choose the attacks size by smallest size the IDP product can detect. Next section will describe the variability of the session extraction system.
4.2 Variation, completeness and purity
4.2.1 The variation of the session extraction system
The definition of variation in this work is the complement of the probability of the extracted attack’s mode value. The mode value is the most frequent value. Therefore, the variation is defined by the equation
%
In our experiment, the different extracted attack sizes for each attack when they could classify as the same attacks come from the result of the comparing the attack size with the size that the most frequent size. The low variation of the session extraction system must be proved if we want to use the results of the session extraction system. In
In our experiment, the different extracted attack sizes for each attack when they could classify as the same attacks come from the result of the comparing the attack size with the size that the most frequent size. The low variation of the session extraction system must be proved if we want to use the results of the session extraction system. In