BotDigger: A Fuzzy Inference System for Botnet Detection

(1)

BotDigger: A Fuzzy Inference System for Botnet Detection

Basheer Al-Duwairi

Network Engineering and Security Department Jordan University of Science and technology

P.O. Box 3030, Irbid 22110, Jordan e-mail: basheer@just.edu.jo

Lina Al-Ebbini

Computer Engineering Department Jordan University of Science and technology

P.O. Box 3030, Irbid 22110, Jordan e-mail: leen.bme@gmail.com

Abstract—This paper proposes BotDigger, a fuzzy logic-based botnet detection system. In this system, we derive a set of logical rules based on a well known botnet characteristics.

Utilizing these rules, an adaptive logic algorithm will be applied on network traffic traces searching for botnet footprints and associating a trust level for each host present in the sampled data. Future work will focus on evaluating the proposed approach using real traffic traces.

Keywords- network security; botnet detection; fuzzy logic

I. INTRODUCTION

A Botnet is a large collection of compromised machines, referred to as zombies [1]. Attackers are increasingly using these large networks of compromised machines to generate different types of attacks that include spam, distributed denial of service (DDoS), click fraud, identity theft, etc. [2], [3], [4].

Botnets generally operate in two main planes: the command and control (C&C) plane where bots receive commands from the botmaster, and the activity plane where bots execute these commands to launch different types of attacks [5], [6]. The C&C in botnets can take several forms with different levels of sophistication and robustness. Based on the control plane topology, botnets are classified into centralized, peer-to-peer, or random. In centralized botnets [3], usually based on the Internet relay chat (IRC) protocol, the botmaster sets up a central server and instructs bots to connect to it and wait for commands. The server waits for new bots to connect, registers them in its database, tracks their status and sends them commands selected by the botmaster. Hence, all bots join a specific channel on the IRC server and interpret all the messages they receive as commands.

On the other side, peer-to-peer botnets are new yet rapidly growing Internet threats. There are no central servers that distribute commands in such type of botnets. To control the botnet, the botmaster need only to join the network as another peer and send the command to other peers to pass them along [7], [8]. Thus, if some nodes in a peer-to-peer network are taken offline, the gaps in the network are closed and the network continues to operate under the control of the control of the botmaster.

In a decentralized botnet, such as that uses Hyper Text Transfer Protocol (HTTP), bots are not maintaining

connection with a C&C server [9]. Instead, malicious bots connects periodically to an HTTP server that is under a botmaster’s control.

Detecting botnet traffic is a very challenging problem, because [2], [6]: (1) botnets use existing application protocol which makes it indistinguishable from the normal traffic; (2) the traffic volume is low and there may be very few bots in the targeted network, and (3) classifying traffic applications becomes more challenging due to the traffic content encryption and the unreliable destination port labeling method.

In this paper, we propose a fuzzy-based botnet detection system. The proposed system, called BotDigger, utilizes fuzzy logic, in which we derive logical rules based on defined botnet characteristics. We believe that fuzzy logic is appropriate for botnet detection problem, because many quantitative features can potentially be viewed as fuzzy variables and security itself includes fuzziness.

The rest of the paper is organized as follows. Section II discusses related work; Section III explains the proposed approach; the discussion is described in Section IV, and finally, in Section V, conclusion and future work are presented.

II. RELATEDWORK

There have been a lot of research efforts on botnet detection in recent years. Previous work on this topic can be classified into the following approaches:

1) Sink-hole based detection: sinkhole fakes the C&C to live bots, it mimics botmaster while logging botnet membership information. Dagon et al. [10] use DNS sinkhole redirection to measure botnet properties and develop a diurnal model for botnet propagation. However, the sinkhole approach is limited to specific types of botnet to be detected, thus it can not support any type of botnet at any time. Of course, this approach requires cooperation from the entities that control the botnet domains.

2) DNSBL-based detection: The authors in [11]

presented several techniques for detecting DNSbased Blackhole (DNSBL) reconnaissance activity, where botmasters perform lookups against the DNSBL to determine whether their spamming bots have been blacklisted. Although some bots perform a large number of reconnaissance queries, it appears that much of the The Fifth International Conference on Internet Monitoring and Protection

(2)

reconnaissance activity is spread across many bots each of which issue few queries, thus making detection more difficult.

3) Traffic-traces-based detection:

• Detecting botnets with tight C&C: The authors in [12] proposed an architecture that first eliminates traffic that is unlikely to be part of a botnet, classifies the remaining traffic into a group that is likely to be part of a botnet based on the Naive Bayesian classification scheme, then correlates the likely traffic to find common communications patterns that would suggest the activity of a botnet.

The major problem with this approach is that it is confined on overcoming malicious IRC-type botnet flows without considering other protocol types.

• BotHunter: A passive bot detection system, BotHunter, is presented in [13]. It uses vertical correlation to associate IDS events to a user defined bot infection dialog model. There is an evasive tactic resides in attack time threshold.

• BotSniffer: A network anomaly based botnet detection system [6]. It explores the spatial- temporal correlation and similarity properties of botnet command and control activities.

• BotMiner: Recently, a novel network anomaly based botnet detection system [14], called Bot- Miner, exhibits similar C&C communication patterns and similar malicious activities patterns.

This approach suffers from an evasion of injection of flow-level noise.

In this paper, we propose a new botnet detection approach. The proposed approach is similar to those presented in [12], [13], in the sense that it analyzes huge amount of traffic traces searching for botnet footprints. The main difference is that we employ fuzzy logic for botnet detection. The main contribution of this paper is the development of a fuzzy inference system (FIS) for botnet detection. This FIS is based on certain botnet characteristics that are well known for the research community. We believe that utilizing fuzzy logic in this context would improve the detection accuracy and allows the detection of emerging botnets. Future work will focus on an adaptive neurofuzzy inference system for automatic botnet detection.

III. THE PROPOSED APPROACH

Before going into the details of BotDigger, the proposed fuzzy inference system (FIS), we introduce basic information about fuzzy logic. Fuzzy logic is a powerful technique for dealing with human reasoning and decision- making processes. By applying fuzzy logic, we can quantify the contribution of a fuzzy membership set. A fuzzy expert system is a collection of membership functions and rules that are used to reason about data [15]. In fact, fuzzy logic has been widely used in the design and enhancement of a

vast number of applications including linear and nonlinear control, pattern recognition, financial systems, operations research and data analysis [16]. The efficient use of fuzzy logic in different applications depends on the proper selection of the number, the type and the parameter of the fuzzy membership functions and rules. Fuzzy techniques incorporate information sources into a fuzzy rule base that represents the knowledge of the network structure so that structure learning techniques can easily be accomplished.

The operation of a fuzzy logic system depends on the nature of the problem and does not follow a fixed pattern.

Therefore, each application imposes certain design requirements. In the case of Botdigger, the fuzzy logic module will perform the following steps:

• Step 1: Dividing the input and output spaces into fuzzy regions. Fig. 3 shows an example where the domain interval of input 1 is divided into three regions. The shape of each membership function is gbell. Of course, other divisions of the domain regions and other membership functions are possible.

• Step 2: Generating fuzzy rules from given input terms, based on what situation(s) which action(s) should be taken and/or what information is available that give the specific value of the input and the corresponding successful output.

• Step 3: Assigning a degree to each rule, since it is highly probable that there will be some conflicting rules. Thus, the rule that has a maximum degree is accepted, hence the number of rules is reduced.

• Step 4: Determining a mapping based on the fuzzy rule base, where centroid defuzzification is used to define the smallest absolute value among all the points at which the membership function has membership value equal to one.

Fig. 1. depicts the architecture of Botdigger. Botdigger takes reduced network traffic traces as an input and for each host present in the traces, it provides a score that represents the trust level of that host. The reduced network traffic traces can be obtained by following an approach similar to that used in [6], [9] and [12], where the filtered traffic includes those of one way traffic, traffic not originating from monitored hosts and traffic that is destined to legitimate servers. This means that the proposed approach firstly assumes filtering out unlikely flows, so that the most computationally intensive analysis is done on a reduced traffic set. The conserved flows are correlated with each other, looking for groups of flows that may be related by being part of the same botnet. The result is a group of flows that are most likely part of one or more botnets.

In fact, the payloads of botnet packets might be encrypted, so they will not reflect the actual nature of botnet behavior. Therefore, in IRC-based botnets the considered

(3)

parameters will be related to packet headers and to some statistical analysis based on published facts regarding the anomaly behavior. For example, the average packet size of IRC is less than that of HTTP packet. Moreover, in the case of IRC the variance in packet size is large.

Figure 1. Architecture overview of our BotDigger.

The proposed system generates a set of rules based on the following attributes which characterize IRC and HTTP- based botnets:

1) Most packets are really small and appear in regular intervals (Ping/Pong communication), where normal IRC traffic has more and larger packets [4], [17], [18].

2) Message exchange ratio [6]: the ratio between the ingoing messages (mi) and the outgoing messages (mo). In a normal traffic (mi/ mo) > 1, while in botnet traffic (mi/ mo) ≤ 1. In the botnet case, the number of incoming messages can be close to the number of outgoing messages because a client cannot receive the messages sent by other clients. The number of incoming messages can also be smaller than the number of outgoing messages, for example, when there are several responses from a bot corresponding to one botmaster command, or when the botmaster is currently shutting down. In contrast, in the normal case, usually there are multiple clients in the chatting channel, and a user usually receives more messages (from all other users) than he sends.

3) Homogeneity check (i.e. the activity response crowd) [6]: if two or more scans have similar distribution -which can be represented by entropy- of the target IP address, we can prove the presence of malicious traffic. Since, it is less likely that by chance two or more hosts form a homogeneity activity response. This attribute can be represented in fuzzy logic by firstly grouping who have similar entropy in the same cluster. Then, if the cluster size ≥ 2 it is considered a malicious, otherwise the case will be normal.

4) Degree of Periodic Repeatability (DPR): this variable represents the degree of periodic repeatability (i.e. the variance), where we should observe that the user is a bot, if the intervals between polling are regular. It is found that if DPR < 0.02 then there is malicious traffic [9].

At first glance the problem seems to be hard to manage, because too many variables need to be adjusted and several questions need to be answered. However, the proper selection of the variable types, terms classification and fuzzy membership functions were mainly dependent on statistical data and network knowledge. Then, the major challenge was to define consistent and comprehensive rules that are crucial for achieving precise detection. Firstly, the proposed fuzzy model contains four input variables (Port scanning, Message exchange ratio, Homogeneity check and Degree of periodic repeatability (DPR)) and one output variable representing the Trust level.

The fuzzy inference system for the proposed fuzzy model is illustrated in Fig. 2. Each input variable is partitioned into a number of fuzzy subsets . The input and output variables are addressed and assigned with proper membership functions as illustrated in Fig. 3. The gbell membership function was used due to its smoothness and popularity in specifying fuzzy sets. Once the input and output sets are defined and the membership functions are addressed, fuzzy IF-THEN rules that link the input and output membership functions are defined based on various published approaches in this topic [4], [6], [9], [17] and [18].

Figure 2. The proposed Fuzzy Inference System (FIS)

In the proposed FIS, the knowledge base consists of a collection of belief rules defined as follows:

• IF (Port Scanning is Low) and (Message Exchange ratio is High) and (Homogeneity check is Heterogeneous) THEN (Trust Level is High)

• IF (Port Scanning is High) and (Message Exchange ratio is Low) and (Homogeneity check is Homogeneous) and (DPR is Small) THEN (Trust Level is Very Low)

(4)

• IF (Port Scanning is High) and (Message Exchange ratio is High) and (Homogeneity check is Heterogeneous) THEN (Trust Level is Medium)

• IF (Port Scanning is Low) and (Message Exchange ratio is Low) and (Homogeneity check is Homogeneous) and (DPR is Small) THEN (Trust Level is Low)

• IF (Port Scanning is Low) and (Message Exchange ratio is High) and (Homogeneity check is Homogeneous) and (DPR is Small) THEN (Trust Level is Low)

• IF (Port Scanning is Medium) and (Message Exchange ratio is Low) and (Homogeneity check is Homogeneous) and (DPR is Small) THEN (Trust Level is Very Low)

• IF (Port Scanning is Medium) and (Message Exchange ratio is Low) and (Homogeneity check is Homogeneous) THEN (Trust Level is Medium)

• IF (Port Scanning is High) and (Message Exchange ratio is High) and (Homogeneity check is Heterogeneous) and (DPR is Small) THEN (Trust Level is Very Low)

• IF (Port Scanning is Low) and (Message Exchange ratio is High) and (Homogeneity check is Heterogeneous) and (DPR is Small) THEN (Trust Level is Medium)

It is to be noted that all rules follow the following pattern:

Ruler: IF x1 is a1 and x2 is a2 and…and xn is an THEN y is b Where x1,x2,…, and xn represent the antecedent attributes in the rth rule; ai (i=1, 2, …,n) is the value of the i^th antecedent attribute; y represents the output consequent, and the value is b.

The defined rules are intuitive; based on the best of our knowledge and enhanced by recent research studies [4], [6], [9], [17] and [18]. The rules output is demonstrated using Min-Max law and a crisp output is computed using Centroid Defuzzification method.

However, the rules were identified manually, thus, the rules may be approximate. Because this method is off-line in nature, it can become impractical and cannot detect real- time attacks. So, the future work will focus on dynamic rule generation method to train traces and extract new rules for botnets detection.

(a)

(b)

(c)

(5)

(d)

(e)

Figure 3. (a-d) Membership functions for the input variables, and (e) Membership function for the output variable.

IV. DISCUSSION

Our preliminary results demonstrate that the fuzzy data mining techniques provide an effective means to learn and alert based on patterns extracted from large amounts of data.

From a numerical perspective, it is found that the very low linguistic term in the trust level output represents 12.5

%, while the high linguistic term represents 81.2 %.

Accordingly, we need to generate further rules; to achieve reasonable and more accurate trusted levels. At this point, it is worth mentioning that it would be much better to include other variables such as, time and geographical zones [19] to achieve a comprehensive, consistent and general model that has the potential to sustain wide range of botnets.

The defined rules can also be clearly observed in three- dimensional surface view as illustrated in Fig. 4. Clearly, the results represent translation of the rules into a numerical

distribution; in Fig. 4 (a) the highest trust level appears where no port scan exists and where message exchange ratio

> 1, while a malicious traffic is detected at the combination of high level of port scanning and message exchange ratio <

1. Thus, the values between those extremes will have a significant effect in minimizing the false positive rate (FPR) and false negative rate (FNR) as much as possible.

However, according to our rules, the representation of the variables shows that both port scanning and message exchange ratio are the dominant in affecting the trust level.

Regarding the HTTP traffic, we have considered the DPR that represents the variance of the traffic.

As shown in Fig. 4 (b) there is a narrow range that depicts the relation between port scanning and DPR, where as DPR is less than 0.02 we can detect malicious HTTP traffic. Obviously, there is a lack in the parameters that describe HTTP traffic behavior. However, there were a few research papers examined HTTP traffic features to be considered in our work [9], [18]. Thus, it will be our task to explore other features that might be useful to our approach.

(a)

(b)

Figure 4. Graphical representation of some of the input-output variables.

(a) Port scan and message exchange ratio vs. Trust level. (b) Port scan and DPR vs. Trust level

(6)

V. CONCLUSIONSANDFUTUREWORK An intelligent system for botnet detection, called BotDigger, has been introduced. The proposed system utilizes fuzzy logic in order to define logical rules that are mainly based on some statistical facts and important features that identify botnet activities. The key advantage of the architecture designed in this research is that it allows the integration of wide range of traffic specifications. Most importantly, it is more reliable and flexible approach compared to the previous works, where our system has the capability to be extended in order to handle further new input parameters and then to generate more logical rules in order to support detection of any born attacks in the future.

From the completeness point of view, further investigation is needed in order to add additional input variables, thus, more logical rules to our model. Moreover, the proposed model should be examined, evaluated and extended to represent an adaptive neurofuzzy inference system, where new rules could be generated by learning.

REFERENCES

[1] Zombie computer, available at

http://en.wikipedia.org/wiki/Zombie_computer, accessed on 14 Feb.

2010.

[2] W. Lu and A. Ghorbani. "Botnets Detection Based on IRC Community". IEEE Communications Society, 2008.

[3] C. Mazzariello. "IRC traffic analysis for botnet detection". IEEE Computer Sosciety, 318-323, 2008.

[4] A. Karasaridis, B. Rexroad, and D. Hoeflin. "Widescale Botnet Detection and Characterization". In First Workshop on Hot Topics in Understanding Botnets, 2007.

[5] M. A. Rajab, J. Zarfoss, F. Monrose, and A. Terzis. "A Multifaceted Approach to Understanding the Botnet Phenomenon". In Proceeding of IMC, 41-52, 2006.

[6] G. Gu, J. Zhang, and W. Lee. "BotSniffer: Detecting botnet command and control channels in network traffic". In Proceedings of the 15th

Annual Network and Distributed System Security Symposium (NDSS’08), 2008.

[7] E. Ruitenbeek and W. Sanders. "Modeling Peer-to-Peer Botnets".

IEEE Computer Society, 307-316, 2008.

[8] B. Wang, Z. Li, H. Tu, and J. Ma. " Measuring Peerto-Peer Botnets Using Control Flow Stability". 2009 International Conference on Availability, Reliability and Security, IEEE Computer Society, 663- 669, 2009.

[9] J. Lee, H. Jeong, J. Park, M. Kim, and B. Noh. "The Activity Analysis of Malicious HTTP based Botnets using Degree of Periodic Repeatability". International Conference on Security Technology, 2008.

[10] D. Dagon, C. Zou, and W. Lee. "Modeling botnet propagation using time zones". In Proceedings of the 13th Annual Network and Distributed System Security Symposium (NDSS ’06), 2006.

[11] A. Ramachandran, N. Feamster, and D. Dagon. "Revealing Botnet Membership Using DNSBL Counter-Intelligence", 2007.

[12] W. Strayer, R. Walsh, C. Livadas, and D. Lapsley. "Detecting Botnets with Tight Command and Control". IEEE Conference on Local Computer Networks (LCN’06), 2006.

[13] G. Gu, P. Porras, V. Yegneswaran, M. Fong, and W. Lee.

"BotHunter: Detecting malware infection through ids-driven dialog correlation". In Proceedings of the 16th USENIX Security Symposium (Security’07), 2007.

[14] G. Gu, R. Perdisci, J. Zhang, and W. Lee. "BotMiner: Clustering Analysis of Network Traffic for Protocoland Structure-Independent Botnet Detection". In USENIX Security Symposium, July 2008.

[15] L.A. Zadeh, Fuzzy sets, Information and Control 8, 338–353, 1965.

[16] J. Mohammad, T. Andre, Z. Lotfi, and B. Serge, Applications of Fuzzy Logic: Towards High Machine Intelligence Quotient Systems.

Upper Saddle River, NJ: Prentice-Hall, 1997.

[17] C. Livadas, R. Walsh, D. Lapsley, and W. Strayer. "Using Machine Learning Techniques to Identify Botnet Traffic". 2nd IEEE LCN Workshop on Network Security, 967-974, 2006.

[18] H. Weststrate. "Botnet detection using netflow information". 10th Twente Student Conference on IT, 23rd January, 2009.

[19] D. Dagon, C. Zou, and W. Lee. "Modeling Botnet Propagation Using Time Zones", 2006.