3.1 Hardware/Software partition
The first and most important step is hardware and software partition. According to Amdahl’s law—“make the common case fast”, the most critical part should be implemented in hardware. Because string matching is critical in virus-scanning and IDS, it is made into a hardware module to accelerate the scanning. We intend to accelerate the open source software ClamAV to increase the throughput. Fig. 3 shows
the hardware and software partition in ClamAV.
The software part of ClamAV works with BFAST* to quickly filter files without viruses. ClamAV performs exact string matching and matcher-bfast* calls driver to enable DMA for transferring data and enable BFAST* to scan the file. The file is infected if there is no possible virus, and if BFAST* finds a possible virus, ClamAV pass the file to matcher-bm for verification.
ClamAV
Fig. 3 Partitioned string matching: hardware scanning and software verification.
3.2 Hardware interface design
BFAST* hardware has five modules: TextPoint, TextRam, HashGenerator,
BloomFilterQuery and TPController. (1) TextPoint stores the address in TextRam
where we want to scan. (2) TextRam stores the text to be scanned. (3) HashGenerator generates the hash values of the block. (4) BloomFilterQuery is reported the shift distance, according to the hash value generates from TextHashGenerator. (5) TPController controls these modules, shifts the window and stores the status into registers.We design a five-stage pipeline for BFAST*, so TPController can shift the
window per cycle. Fig. 4 shows the five stages:
(1)TextPosition (TP): it gets an address from TextPoint.
(2)TextRead: TPController gets the text from TextRam.
(3)Hash: TPController gets the hash value of HashGenerator.
(4)ShiftDistance: TPController gets the shift value from BloomFilterQuery.
(5)WB: TPController computes the shit distance and update the address of TextPosition.
The stages of TextRead and ShiftDistance are memory access, so the memory access time restricts the clock rate.
1 2 3 4 5 6
Fig. 4 Five stage pipeline.
Five TPControllers are instantiated because there are five pipeline stages. Each TPController scans 1600 bytes of data in TextRam. We make each TextPoint in TPController points to the address of TextRam at 0, 1600, 3200, 4800, 6400, and TPController0 scans data in TextRam addressed from 0 to 1599 and TPController1 scans the data in TextRam from addressed 1600 to 3200 and so on.
Fig. 5 shows the state machine of TPController. There are four states: INIT, SCAN, CHECK and HOLD.
1. In the beginning, TPController stay in the state INIT until BFAST* is enabled, and then current state transits to SCAN state.
2. In SCAN state, TPController gets the shift distance, if the shift distance is zero, it
means that the text should be additional checking, and then current state transits to CHECK state. Otherwise, if the shift distance is not zero then TPController update TextPoint by adding shift distance and stay in SCAN state until all data in TextRam is scanned finished.
3. While state transits to CHECK state, TPController check that every blocks in window is hit. If every blocks is hit with corresponding position then current state transits to HOLD and reports that there is a possible match
4. The HOLD state keeps the TPController status.
MbitVector i hit
BFAST* enable BFAST* disable
shift distance==0
Shift distance!=0
MbitVector i hit
TextPoint > length
MbitVector i no hit i >= 8
BFAST* disable BFAST* enable
HOLD
CHECK
TextPoint--i++
SCAN
TextPoint+=shift distance
INIT
Fig. 5 The state transition diagram of TPController
There are two TextRams. When one TextRam is being scanned, BFAST* stores data into another TextRam. The two TextRam act alternately.
3.3 Software interface design
Two mechanisms for ClamAV gets information from BFAST*: interrupt and polling. If BFAST* finds a possible virus for ClamAV to verify, BFAST* interrupts the CPU and reports ClamAV that a virus needs to be verified, this kind mechanism of getting information from BFAST* calls interrupt. The benefit of interrupting is CPU may do other tasks, while BFAST* quickly filters no-match cases until BFAST* finds
a possible virus or the scan is finished. If an interrupt occurs, the CPU should switch to ClamAV, and context switch adds the overheads of scanning. If many possible viruses appear, the context switch overhead is huge.
On the other hand, the mechanism that ClamAV keeps detecting the status of BFAST* is called polling. Because polling lets ClamAV know a possible virus immediately and the CPU does not need to do context switch, it performs better than interrupting. Polling increases the throughput of virus scanning but decreases the performance of whole system unless CPU does not need to do any other tasks. We adapt the polling mechanism for higher throughput.
ClamAV scans a file using Boyer-Moore algorithm and Aho-Croasick algorithm, Boyer-Moore algorithm is implemented as matcher-bm library and it performs exact string match. Aho-Croasick algorithm is implemented as matcher-ac library, and it performs regular expression string match.
In our design, BFAST* performs exact string matching and filers the no-match cases first. If a possible virus is detected by BFAST* then matcher-bm is invoked for verification. On the other hand, if there is no possible virus is detected, it means the file is not infected. Most of network traffic and files in system belongs to the no-match case, so it scans quickly.
3.4 Hardware/Software interface design
The driver functions include memory writing, scanning module behavior and getting status. The data structure in the memory of BFAST* is memory mapped, so we just write data to corresponding memory address. Several data structures should be written: (1) the hash functions (2) blocks in the patterns and (3) the text to be scanned.
After the data are well prepared, TPController in BFAST* will start to scan the text and write the status to registers.
TextRam in BFAST*. The problem is that data from user space can’t declare a continued physical memory space, so driver create a continued physical memory space and copy the data from ClamAV into this continued physical memory. This continued physical memory space can’t be cached, because if DMA transfer data and the memory didn’t update yet, then DMA transfers the old data into TextRam.