Experiment Results - Performance Results - 利用位元壓縮法的快速封包分類演算法

Chapter 4. Performance Results

4.3 Experiment Results

This study considers the complexity of storage requirement and classification performance. We focus on the two dimensional rule table, IP destination address and IP source address. The proposed scheme randomly generates two field rules to create a synthesized rule table, as previous experiments consider the prefix length distribution and β. Recall that for the real-life routing table, the value of β is approximately 10^-5 and maximum overlap is measured to be between β=10^-4 and 10^-5. Therefore, this study lists the experimental performance statistics with β=10^-4 and 10^-5 and even a larger β, 10^-3.

Figures 12, 13 and 14 compare the memory requirements (based on log2) for the bit-map intersection and bit compression schemes. Notably, since the bitmap intersection and bit compression use the same size of memory storage to store interval boundary, we omit the memory storage of interval boundary in memory requirements.

The experimental results demonstrate that the proposed scheme performs better than bitmap intersection. Under β=10^-5,with the rule table size of 5K, we need only 164 Kbytes to store the bit compression algorithm compared with 12.5 Mbytes by bitmap intersection. And with the rule table size of 10K, 374 Kbytes is needed to store the bit compression algorithm compared with 48 Mbytes of bitmap intersection. The memory storage for bitmap intersection scales quadratically each time the number of rules doubles, while bit compression algorithm is almost linear with rule numbers. Bit compression algorithm prevents memory exploration with large rule tables. The difference between theoretical and implementation on IXP1200 is that the lengths of CBV, index list and DCV are a multiple of 32 bits when stored on IXP1200 for convenient memory access, creating a certain amount of space wastage. Therefore, the memory storage with implementation on IXP1200 is higher than the theoretical value.

0 2 4 6 8 10 12 14 16

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K

number of rules memory storage (KBs) (based on log2)

bit-compression (theoretical) bitmap intersection (theoretical) bit-compression (on IXP1200) bitmap intersection (on IXP1200)

Figure 12: Compare the memory requirements (log2 scale) between the bit compression and the bitmap intersection algorithm under β=10^-3

0 2 4 6 8 10 12 14 16

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K

number of rules memory storage (KBs) (based on log2)

bit-compression (theoretical) bitmap intersection (theoretical) bit-compression (on IXP1200) bitmap intersection (on IXP1200)

Figure 13: Compare the memory requirements (log2 scale) between the bit compression and the bitmap intersection algorithm under β=10^-4

0 2 4 6 8 10 12 14 16 18

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K

number of rules memory storage (KBs) (based on log2)

bit-compression (theoretical) bitmap intersection (theoretical) bit-compression (on IXP1200) bitmap intersection (on IXP1200)

Figure 14: Compare the memory requirements (log2 scale) between the bit compression and the bitmap intersection algorithm under β=10^-5

As noted previously, the space of index table can be further reduced by merging the rule sets. Figures 15, 16 and 17 display the total memory space consumed by the rule table of the bit compression scheme with and without merging under. As a result, the required space is reduced about 25%~40% after merging the rule sets.

0 20 40 60 80 100 120 140 160 180

Memory Stroage (KBs)

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K Number of Rules

before merging after merging

Figure 15: The improvement of memory storage by merging rule sets under β=10^-3

0 50 100 150 200 250 300

Memory Stroage (KBs)

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K Number of Rules

before merging after merging

Figure 16: The improvement of memory storage by merging rule sets under β=10^-4

0 50 100 150 200 250 300 350 400

Memory Stroage (KBs)

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K Number of Rules

before merging after merging

Figure 17: The improvement of memory storage by merging rule sets under β=10^-5

Besides the bit compression and bitmap intersection schemes, we propose a different compression scheme here, called ACBV (Aggregated and Compressed Bit Vector), which is a modification of the ABV scheme [9]. The ACBV scheme compresses the bit vectors into aggregated and compressed bit vectors. An aggregated and compressed bit vector comprises two parts, aggregated part and compressed part.

The aggregated part is an aggregated bit vector of the ABV scheme. The compressed part records sections of bit vector in which have at least one ‘1’ bit. The preprocessing steps of the ACBV scheme are as follows.

1. Aggregate the bit vector to form the aggregated part. Bit i is set in the aggregated part if there is at least one bit set in the bit vector from to

bits, otherwise bit is cleared. A denotes aggregated size.

(

i×A

)

^th

( )

(

i+1 ×A −1

)

^th i

2. Construct the compressed part. For i from 0 to

⎡

^N ^A

⎤

－1, where N denotes the number of rules. If bit i is set in the aggregated bit vector, the section of the bit

vector from to bits is selected. Compact the selected sections of bit vector together in selecting order to form the compressed part.

(

i×A

)

^th

( ( ( )

i+1 ×A

)

−1

)

^th

3. Append the compressed part to the aggregated part to form the “aggregated and compressed bit vector”.

4. To maintain the same vector length, fill up ‘0’ bits to the end of the aggregated and compressed bit vectors.

Figure 18 illustrates the preprocessing steps of the ACBV scheme. Figure 18(a) presents the bitmap of the 2-dimensional rule table. Let’s consider the interval X3. First, the aggregated part is built. We use an aggregate size A=4 in Fig. 18(b). The associated bit vector “10010000” is aggregated to “10” first. Subsequently, according to the aggregated part, select the desired sections of the bit vector. Since the first bit of the aggregated part is ‘1’. The first 4 bits of the bit vector “10010000” are selected to form the compressed part, which is “1001”. Then append the compressed part to the aggregated part, the aggregated and compressed bit vector “101001” is completed.

After describing the preprocessing steps, let’s explain how the ACBV scheme classifies the packet. Assume an arriving packet p shown in Fig. 18(b) falling into interval X3. The classification process is as follows. First, the aggregated and compressed bit vector associated with X3 is accessed. The aggregated part “10”

indicates the first 4 bits of the bit vector are recorded in the compressed part, while the second 4 bits are not recorded. The second 4 bits of the bit vector is “0000”. Therefore, read the compressed part, we can restore the bit vector to “10010000”. For dimension Y, operate the similar process to restore the bit vector, and take the conjunction of these bit vectors to get best-matching rule.

Furthermore, the rule table presented in Fig. 18 doesn’t have wildcarded rules. If the wildcarded rules are considered, the idea of “Don’t Care Vector” of the bit

compression scheme can be applied.

Figure 18: An example of the ACBV scheme

Figure 19 compares the memory requirements (based on log2) for the ACBV, bit compression and bitmap intersection schemes under β=10^-5. In the ACBV scheme, we

take 4 different aggregate sizes (8, 16, 32 and 64) into consideration. Under 4 different aggregate sizes, the memory requirements for ACBV scheme are between 638 Kbytes and 1.62 Mbytes for 5K rules and between 1.94 Mbytes and 6.33 Mbytes for 10K rules. Recall that the bitmap intersection requires 12.5 Mbytes and 374 Kbytes for 5K and 10K rules, and the bit compression requires 164 Kbytes and 374 Kbytes for 5K and 10K rules. The experimental results demonstrate that the ACBV scheme performs better than bitmap intersection while worse than bit compression.

0 2 4 6 8 10 12 14 16 18

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K

Number of Rules

Memory Storage (KBs) (based on log2)

ACBV, size=8 ACBV, size=16 ACBV, size=32

ACBV, size=64 bit compression bitmap

Figure 19: Compare the memory requirements (log2 scale) for ACBV, bit compression and the bitmap intersection algorithm under β=10^-5

Table 4, 5 and 6 show the worse case memory access times with different number of rules under β = 10^-3, 10^-4 and 10^-5 on IXP 1200. Here, we assume the memory access time is 14, 20 and 40 clks for scratchpad, SRAM and SDRAM respectively.

In the bitmap intersection scheme, the rule table is expected to store in SRAM.

But the memory storage increases rapidly such that the required storage exceeds the

size of SRAM (8MB). For example, under β = 10^-5, the required storage space for the rule table with rules more than 4K exceeds 8MB. Thus the rule table of more than 4K rules must be stored in SDRAM.

In the bit compression scheme, the memory exploration is prevented. For 2-dimensional rule table with 10K rules, the bit compression scheme still store the bit vector and index table using SRAM without SDRAM. Moreover, because most memory access cost of the bit compression scheme is expended to access the DCVs, we take advantage of memory hierarchy to store the DCVs in the smallest (4KB) but fastest scratchpad memory rather than SRAM. Storing the DCVs in the scratchpad memory facilitate decreasing memory access time for our bit compression scheme, while the bitmap intersection can only use SRAM or SDRAM. Therefore, although the times of memory access of bit compression are more than bitmap intersection for the same size rule tables, the memory access performance of bit compression is better than bitmap intersection.

In the ACBV scheme, we use an aggregate size 32. Under aggregate size 32, the memory requirement is below the size of SRAM for 10K rules. Therefore, SRAM is employed to store the aggregated and compressed bit vectors. Moreover, like the bit compression scheme, the ACBV scheme utilizes the scratchpad memory to stores DCVs. Since an aggregated and compressed bit vector has more bits than the sum of a compressed bit vector and index list. The memory access times of ACBV scheme are more than bit compression scheme. Therefore, the memory access performance of ACBV is worse than bit compression. However the memory access performance is still better than bitmap intersection.

Number of Rules 1K 2K 3K 4K 5K 6K 7K 8K 9K 10K

time (clks) 1280 2520 3760 5000 6280 7520 8800 10000 11280 12560 No. of

time (clks) 1096 2044 2952 3860 4876 5784 6692 7600 8536 9444 No. of

time (clks) 1216 2284 3272 4300 5356 6344 7372 8360 9376 10364

Table 4: Worse case of memory access times under β = 10^-3on IXP1200

Number of Rules 1K 2K 3K 4K 5K 6K 7K 8K 9K 10K

time (clks) 1280 2520 3760 5000 6320 7520 8800 10000 11280 12560 No. of

time (clks) 1016 1924 2762 3660 4556 5464 6332 7220 8096 9004 No. of

time (clks) 1096 2044 2952 3900 4836 5784 6692 7640 8576 9524

Table 5: Worse case of memory access times under β = 10^-4on IXP1200

Number of Rules 1K 2K 3K 4K 5K 6K 7K 8K 9K 10K No. of

Scratchpad access

0 0 0 0 0 0 0 0 0 0

No. of

SRAM access 64 126 188 0 0 0 0 0 0 0 No. of

SDRAM access 0 0 0 126 158 188 220 250 282 314 bit map

intersection

Memory access

time (clks) 1280 2520 3760 5040 6320 7520 8800 10000 11280 12560 No. of

Scratchpad access

64 126 188 250 314 376 438 500 564 626

No. of

SRAM access 6 6 6 6 6 6 6 6 6 6

No. of

SDRAM access 0 0 0 0 0 0 0 0 0 0

bit compression

Memory access

time (clks) 1016 1884 2752 3620 4516 5384 6252 7120 8016 8884 No. of

Scratchpad access

64 126 188 250 314 376 438 500 564 626

No. of

SRAM access 8 10 12 14 18 20 22 24 26 28 No. of

SDRAM access 0 0 0 0 0 0 0 0 0 0

ACBV, size =32

Memory access

time (clks) 1056 1964 2872 3780 4756 5664 6572 7480 8416 9324

Table 6: Worse case of memory access times under β = 10^-5on IXP1200

As mentioned above, the bit compression and ACBV need less memory access time than bitmap intersection. But notably, compared with bitmap intersection, the bit compression and ACBV algorithm require decompressing a compressed bit vector or aggregated and compressed bit vector to a bit vector. Extra processing time for decompression is required, which will degrade the classification performance.

However, the time for decompression is actually much less than memory access time.

The memory access time dominates the classification performance. Therefore, even the bit compression and ACBV schemes require extra processing time for decompression. The bit compression and ACBV schemes can still outperform bitmap intersection. Figure 20 presents the packet transmission rates for bitmap intersection, bit compression and ACBV schemes with different size of rule tables without wildcards under β =10^-5on IXP 1200. In the bitmap intersection scheme, the rule tables are stored in SDRAM. In the bit compression and ACBV schemes, the rule

table is stored in SRAM only. Because the memory access time for reading a CBV and index list or reading an aggregated and compressed bit vector is less than reading a bit vector. Although extra processing time for decompression is required. The bit compression and ACBV schemes outperform bitmap intersection scheme. The transmission rate of the ACBV scheme is between the bitmap intersection scheme and the bit compression scheme. Moreover, since the length of CBVs and index lists almost remain a fixed value (according to maximum overlap), the transmission rate of the bit compression scheme nearly remain constant. In contrast, the transmission rate of bitmap intersection decreases linearly with number of rules.

0 50 100 150 200 250 300 350 400 450 500

1K 2K 3K 4K 5K 6K 7K 8K 9K 10K

numbers of rule

transmission rate (Mbps)

bitmap intersection (SDRAM) bit compression ACBV Figure 20: Transmission rates for bitmap intersection, bit compression and ACBV on

IXP1200

在文檔中利用位元壓縮法的快速封包分類演算法 (頁 37-49)