• 沒有找到結果。

Programming Schedule

Process 6: Scanning procedure

Inputs: eacwp.datastructure, pre-filter, Advancement table, Text file

Outputs: Matched Signature ID and starting position, if signature occurs in Text.

Description:

During the scanning process, we have to maintain an information : Active_graph.

The procedure will be end if there is any pattern matched or the text file is finished.

The program can also apply on the internet. The only difference is that we need to modify the program for packet based. Since the original program will be end when the input file comes to the end if there is no pattern matched. But in the network, all the file transmission is based on the packet, in other words, we have to scan these packets in order to guarantee the whole completed file to be scanned. It means that the

scanning process doesn’t end until all the packets have been scanned. In order to continue the scanning process between each packet, we must to remember all the status about the scanning process. The status is including the state we are going to continue, and if it’s during the traversal on{min max , } graphs, the counter and the value of min and max are needed. Note that it’s possible that the process will stop on multiple goto graphs when we finish a single packet’s scanning. Not only the state information, pre-filter’s window and its next advancement are both needed too. And the program will end if there is a pattern matched or all the packets are completed finished its own scanning process.

Chapter 7.   

 

Experimental Result   

In this section, we compare the performance of our proposed signature matching system with that of the ClamAV implementation and its enhancement [?]. Both throughput performance and memory requirement are compared. Programs are coded in C++ and the experiments are conducted on a PC with an Intel Pentium 4 CPU operated at 2.02GHz with 1.75GB of RAM.

We traced the ClamAV implementation, extracted the ideas, and re-wrote the codes for our experiments. In the ClamAV implementation, a trie of height two is constructed for the first two bytes of all patterns based on AC pattern matching machine. Effectively, patterns are grouped based on their first two bytes. The failure function for non-leaf states is eliminated because the next move function  is adopted. The next move function  is defined as ( , ) P g P( , ) if

( , )

g P   fail or ( , ) P  ( ( ), )f P  otherwise. When the first two bytes of some group are matched, a sequential search is performed for all patterns in the group.

Different from our proposed scheme, a regular expression is fragmented by the three

*, ?, and {min, max} operators. A data structure is maintained to indicate up to which fragment a regular expression had been matched and the position in the text of the last matched fragment. Consider a regular expression which consists of k fragments. Assume that the first e fragments had been matched and the e th fragment ends at the i position of the text. Assume further that another fragment th is matched at the j position. This newly matched fragment is discarded if it is th

not the (e1)th fragment or i and j do not satisfy the condition specified by the operator which separates the e and the (th e1)th fragments. As an example, consider a regular expression RE = sre1?sre2{2,4}sre {3,5}3 sre4. Assume that the first fragment sre1 was matched at the i position of the text. If the second th fragment sre2 is matched at the ( |isre2| 1) th position, then the data structure will be updated to indicate that the first two fragments are matched and the position of the second fragment is matched at the ( |isre2| 1) th position. Assume that a fragment is further found at the j position, then the data structure is further th updated only if it is the third fragment sre and j satisfies 23  j-i-|sre2|-|sre |-13  4.

Otherwise, the newly matched fragment is discarded and the data structure remains intact.

As of November 2009, the ClamAV database has 30,385 signatures. Among these signatures, 1599 are regular expressions. After converting ? operators into {min, max} operators, there are ? regular expressions which contain at least one {min, max}

operator. The shortest pre-filter pattern has only two bytes. To demonstrate the potential benefit of using a pre-filter, we discard a string which generates a pre-filter pattern of length shorter than 6. We eliminated 217 signatures based on this criterion.

In our simulations, we select K = 6 and L = 3 with four pre-filters. Let t tj j1...tj5 be the string contained in the search window. Since hash functions are not the focus of this paper, we use simple ones. The i hash function used in our experiments is th simply tj 4 i jt  5 itj 5 i jt  6 i, where  represents the bitwise exclusive-OR operation.

Fig 7 shows the comparison of CPU execution time for randomly generated files of various sizes without any signature occurrence. We call our proposed system eacwp for short. It can be seen that the CPU execution time is proportional to file size. The CPU time required by the ClamAV implementation is about 4 times of that required by eacwp. We expect the performance improvement to become larger as the number of signatures increases. The reason is that, in ClamAV implementation, the number of strings in a group with identical first two bytes increases as the number of signatures increases. Since the ClamAV implementation performs sequential search for strings in the same group, it consumes more CPU time to find the match in a larger group.

Figure 7. Performance comparison of ClamAV implementation and our proposed signature matching system for clean files of various sizes.

As  for  memory  requirement,  ClamAV  implementation  uses  3.57M  bytes  and  eacwp  uses  about  5.7M  bytes.    The  pre‐filter  requires  256K  bytes  and  the  verification  module  needs  5.5M  bytes.    We  believe  the  amount  of  memory  required  by  our  proposed signature matching system is acceptable for practical systems. 

Now we modify the pre-filter with a new value of K = 10 and L = 4. And we increase the hash value’s bit number so that the collision due to the hash function will be reduced. So the size of the pre-filter will come to 1M bytes (20 entries, 2^20=1048576). Because of the difference of window size, we discard a string which generates a pre-filter pattern of length shorter than 10. We eliminated a little more, about 377 signatures based on this criterion. And one more difference is that we apply two pre-filters. Each pre-filter is built with its own hash function which is different from the other one. When the first pre-filter’s query result consults the verification module, we apply the second pre-filter instead. The verification module is consulted iff the two pre-filters both consult the verification module. The memory requirement grows up a little, comes to 7.5M bytes. The pre-filter requires 2M bytes and the verification module needs 5.5M bytes. We expect the improvement will work on the performance’s advancement. Fig 8 shows the result and confirms our expectation. The  CPU  time  required  by  the  ClamAV  implementation  is  about  more  than  10  times  of  that required by modified eacwp.

Figure 8. Performance comparison of ClamAV implementation and our proposed signature matching system for clean files of various sizes.

References

[1] D. E. Knuth, J. H. Morris, and V. R. Pratt, “Fast pattern matching in strings,”

TR CS-74-440, Stanford University, Stanford, California, 1974.

[2] R. S. Boyer and J. S. Moore, “A fast string searching algorithm,”

Communications of the ACM, Vol. 20, October 1977, pp. 762-772.

[3] A. V. Aho and M. J. Corasick, “Efficient string matching: an aid to bibliographic search,” Communications of the ACM, Vol. 18, June 1975, pp. 333-340.

[4] Clam anti virus signature database, www.clamav.net.

[5] F. Yu, Z. Chen, Y. Diao, T. V. Lakshman, and R. H. Katz, “Fast and memory-efficient regular expression matching for deep packet inspection,” in Proc. of Architectures for Networking and Communications Systems (ANCS), pp. 93-102, 2006.

[6] G. Vasiliadis, S. Antonatos, M. Polychronakis, E. P. Markatos, and S. Joannidis,

“Gnort: High performance network intrusion detection using graphics processors,” In Recent Advances in Intrusion Detection (RAID), 2008.

[7] J. Rejeb and M. Srinivasan, “Extension of Aho-Corasick algorithm to detect injection attacks,” SCSS (1) 2007.

[8] S. Wu and U. Manber, “A fast algorithm for multi-pattern searching,” TR-94-17, 1994.

[9] B Bloom, “Space/time trade-offs in hash coding with allowable errors,” ACM, 13(7): 422–426, May 1970.

[10] A. Broder and M. Mitzenmacher, “Network applications of Bloom filters: a survey,” Internet Mathematics, vol. 1, no. 4, pp. 485–509.

[11] R. Smith, C. Estan, and S. Jha, “XFA: Fast signature matching with extended automata,” In IEEE Symposium on Security and Privacy, May 2008.

相關文件