Related Works - Performance Metrics and Related Works

Chapter 2. Performance Metrics and Related Works

2.2 Related Works

Extensive collection of papers [2-15] and numerous approaches have been proposed to solve the packet classification problem. This section describes some of these approaches briefly.

Linear search

Linear search is the simplest classification algorithm. It is efficient in terms of memory, but for large number of rules this approach implies a large search time. The data structure is simple and easy to be updated as rules change.

Hierarchical tries

A hierarchical trie (also called multilevel tire or backtracking trie) is a simple extension of the one dimension radix trie data structure. Hierarchical trie is constructed field by field recursively. At beginning, according to first field of each

rule, construct the first level trie. And then, construct the second level trie from the leaf node on the first level trie. Build the remaining fields recursively. Because of the characteristic of recursive traversal, this scheme suffers long search time.

Set-pruning tries

The set-pruning trie [12] is similar to hierarchical trie, but with reduced query time obtained by replicating rules to eliminate recursive traversal. Rules are replicated to ensure every packet will encounter matching rules in one path without backtracking.

Set-pruning trie successfully reduces search time in hierarchical tries, but unfortunately this scheme has a memory explosion problem which makes it impractical when the size of rule table becomes large.

Grid of tries

The grid of tires data structure, proposed by Srinivasan et al. [2] reduces storage space by allocating a rule to only one trie node as in a hierarchical trie, and achieves the same search time with set pruning trie by pre-computing and storing a switch pointer which guides the search process in some nodes. This is a good solution if the rules are restrict to only two fields, but is not easily extended to more fields.

Hi-Cuts (hierarchical intelligent cuttings)

HiCuts [5] attempts to partition the search space in each dimension, guided by simple heuristics that exploits the structure of the rule table. A decision tree data structure is built by carefully preprocessing the rule table. Each time a packet arrives, the decision tree is traversed to find a leaf node, which stores a small number of rules. A linear search of these rules yields the desired matching. While this scheme seems to exploit the characteristics of real rule table, the characteristics vary, however. How to find the suitable decision tree prevents it from scaling well to large rule table.

Hyper-Cuts

HyperCuts [7] is a similar approach to HiCuts, but uses multidimensional cuts at each

step. Unlike HiCuts, in which each node in the decision tree represents a hyperplane, each node in the HyperCuts decision tree represents a k-dimensional hypercube, where k > 1. HyperCuts appears to have excellent lookup performance. It has excellent storage efficiency in many cases, but does not fare quite as well with heavily wildcarded rules.

Tuple Space

Tuple space algorithm [3] partitions the rules into different tuples categories based on the number of specified bits (don’t care bits) in the rules, and then uses hasing among rules within the same tuple. This algorithm has fast average search time and fast update time. The main disadvantage of tuple space algorithm is the use of hashing leading to lookups or updates with non-deterministic duration.

Cross-producting

Cross-producting approach [2] builds a table of all possible field value combinations (cross-products) and pre-computes the best-matching rule for each cross-product.

Search can be done quickly by doing separate lookups on each field, pasting the results together into a cross-product, and then indexing into the cross-product table.

Unfortunately, the size of cross-product table grows enormously with the number of rules and number of fields

RFC (Recursive Flow Classification)

RFC [4] is one of the earliest of the heuristic approaches. RFC attempts to map S bits packet header into T bits identifier, where T =logN (N is number of rules) and T<<S. RFC uses crossproducting in stages, and groups intermediate results into equivalence classes to reduce storage requirements. RFC appears to be a fast classification algorithm; this speed, however, comes at the cost of substantial memory usage, and it does not support an efficient updating process.

TCAM (Ternary Content Addressable Memory)

TCAM is a hardware device performing the function of a fully associative memory. In TCAM, cells can be stored with three types of value: ‘0’, ‘1’ or ‘X’, where ’X’

represents don’t care. ‘X’ can be operated as a mask representing wildcard bits.

Therefore, excluding range match, TCAM can supports exact or prefix match. Range must be transformed to prefix or exact value (e.g. the range gt 1023 can be expressed with 6 prefixes 000001*, 00001*, 0001*, 001*, 01* and 1*). TCAM compares a packet to every rule simultaneously. Packet classification based on TCM is suitable for small rule tables. For large rule table however, using TCAM requires large amount of board space and power consumption and expensive cost.

Bit-map intersection

The bit-map intersection algorithm devised by Lakshman et al. [8]uses the concept of divide-and-conquer, dividing the packet classification problem into k sub-problems, and then combining the results. This scheme uses the geometrical space decomposition approach to project every rule on each dimension. For N rules, a maximum of 2N+1 non-overlapping intervals are created on each dimension. Each interval is associated with an N-bits bit vector. Bit j in the bit vector is set if the projection of the rule range corresponding to rule j overlaps with the interval. On packet arrival, for each dimension, the interval to which the packet belongs is found.

Taking conjunction of the corresponding bit vectors in each dimension to a resultant bit vector, the highest priority entry in the resultant bit vector can be determined.

Since the rules in the rule table are assumed to be sorted in terms of decreasing priority, the first set bit found in the resultant bit vector is the highest priority entry.

The rule corresponding to the first set bit is the best matching rule applied to the arriving packet. This scheme employs bit-level parallelism to match multiple fields concurrently, and can be implemented in hardware for fast classification. However, this scheme is difficult to apply to large rule tables, since the memory storage scales

quadratically each time the number of rules doubles. The same study describes a variation that reduces the space requirement at the expense of higher execution time.

Consider the simple example of a 2-dimensional rule table with four rules, shown in Fig. 2, to illustrate the functioning of the bitmap intersection. The four rules are represented as 2-dimensional rectangles. The construction process of bitmap starts at projecting the edges of the rectangles to the corresponding axes, X axis and Y axis.

The four rectangles then create seven intervals in X axis and six intervals in Y axis.

Subsequently, each interval is associated with a bit vector. Assume a packet, denoted by a point p in Fig. 2, arrives. The first step of classification is to locate the intervals that contain packet p in both axes. The intervals are X3 and Y2 for X and Y axis respectively. Then read the associated bit vectors of intervals X3 (“1011”) and Y2 (“0011”) respectively. After taking conjunction of these two bit vectors, the resultant bit vector “0011” is built. Obviously, rule 3 is the best matching rule applied to the arriving packet p.

Figure 2: An example of bitmap intersection algorithm

ABV (Aggregated Bit Vector)

The Aggregated Bit Vector (ABV) algorithm [9] is an improvement on the bit map intersection scheme. The authors propose two observations: there are sparse set bits in bit vectors and a packet matches few rules in the rule table. Building on these two observations, two key ideas are extended, aggregation of bit vectors and rule rearrangement. Aggregation attempts to reduce memory access time by adding smaller bit vectors called ABV (Aggregate Bit Vector), which partially captures information from the whole bit vectors. The length of an ABV is defined as

⎡

^N ^A

⎤

^{, where A}

denotes aggregate size. Bit is set in ABV if there is at least one bit set in group K, from (i×A)

th bit to ((i+1)×A－1)^th bit, otherwise bit is cleared. Aggregation reduces the search time for bitmap intersection, but produces another unfavorable effect, false match, a situation in which the result of conjunction of all ABV returns a set bit, but no actual match exists in the group of rules identified by the aggregate. False matching may increase memory access time. Rule rearrangement can alleviate the probability of false matching. Although ABV outperforms bit map intersection for memory access time by an order of magnitude, it does not ameliorate the main problem of bit map intersection, exponentially growing memory space, yet uses more space.

Chapter 3. Bit Compression Algorithm

3.1 Using Bit Compression to Reduce Storage Space

As mentioned in the previous section, bit-map intersection is a hardware oriented scheme with rapid classification speed, but suffers from the crucial drawback that the storage requirements increase exponentially with the number of rules. The space complexity of bit-map intersection is O(dN²), where d denotes the number of dimensions and N represents the number of rules. Even though the ABV algorithm improves the search speed, but requires even more memory space than bitmap intersection algorithm. For a hardware solution of packet classification, memory storage is an important performance metric. Decreasing the required storage will reduce costs correspondingly. The question thus arises whether any method exists way of solving the extreme memory storage of a large rule table. Observing the bit vectors produced by each dimension, as mentioned in [9], the set bits (“1” bits) are very sparse in the bit vectors of each dimension, there are considerable clear bits (“0” bits).

The authors of [9] used this property to reduce memory access time, but this property can also be applied to reduce memory storage requirements. For the example of Fig. 4, an approximately 60% space saving can be achieved by removing redundant ‘0’ bits.

The shaded parts of Fig. 4 illustrate the removable ‘0’ bits.

Therefore, our challenge is how to represent compressed format of bit vector. We try to segment each dimension into several sub-ranges. We call the sub-range

“Compressed Region” (CR), where a CR denotes the range of a series of consecutive intervals. In each CR, only an extreme small number of rules are overlapped, while the corresponding bits of the non-overlapped rules in this CR are all 0 bits. If a packet falls into a CR, denoted by CRm, only the overlapped rules need to be taken into

consideration, while the non-overlapped rules do not. The corresponding bits of the non-overlapped rules in CRm are all 0 bits. Neglecting the non-overlapped rules means these 0 bits corresponding to the non-overlapped rules of the bit vectors in CRm can be removed. This study calls the bit vector after removing redundant 0 bits CBV (Compressed Bit Vector).

For example, consider the two dimensional rule table like Fig. 3. By dividing dimension X into four CRs, CR1, CR2, CR3 and CR4. Only R1, R3 and R4 are overlapped with CR1. Therefore, if a packet falls into CR1, only R1, R3 and R4 have to be considered. Consequently, maintaining the first, third and forth bits of the bit vectors while removing the ‘0’ bits of the non-overlapped rules in CR1 are sufficient to looking for matching rules.

However, recall that in the bit map intersection the bit order of a bit vector indicates to the rule order (i^th bits in a bit vector corresponds to i^th rule in rule table).

‘0’ bits are removed from a bit vector in such a way that it is no longer known which remaining bits represent what rules. To solve this problem, this study claims an “index list” with each CR, which stores the rule number associated with the remaining bits.

Collections of the “index list” form an “index table”. For example in Fig. 3, after removing the redundant 0 bits, the bit vectors in CR1 remain three bits. In order to keep track of the rule number of the three remaining bits, an index list [1, 3, 4] is appended in CR1. Each CR associates an index list, and the index table comprising four index lists shows in Fig. 6.

After removing redundant 0 bits, we build the CBVs and index list for each CR.

Because the length (number of bits) of CBV is related to the number of overlapped rules in the corresponding CR, the length of the CBVs is different for each CR, For example as Fig. 4, the length of CBVs in CR1 should be three bits, while the length of CBVs in CR2 should be five bits. However, the bit-map intersection is a

hardware-oriented algorithm, and our improvement scheme is also hardware-oriented.

For convenience of memory access, the length of bit vectors should be fixed, and thus CBVs maintaining fixed length are also desired. Therefore, the length of all CBV should be based on the longest (maximum bits) CBV, and fill up ‘0’ bits to the end of the CBVs which are shorter than the longest CBV. Notably, a similar idea is applied to the index lists, where the index lists should have the same number of entries.

Figure 3: The bitmap in dimension X of a 2-dimensional rule table with 10 rules.

Figure 4: Space saving by removing redundant ‘0’ bits.

Furthermore, rules are considered to have wildcards. This study notes that if rule Ri is wildcarded in dimension j, the i^th bit of each bit vector in dimension j is set, therefore forms a series of ‘1’ bits over dimension j. For example, Fig. 5 illustrates the rule table with two wildcarded rules in dimension X, rule R11 and R12. The last two bits of each bit vector are set because the ranges of R11 and R12 cover all intervals in dimension X. In [9], the authors mentioned that in the destination and source address fields, approximately half of rules are wildcarded. Consequently, half of each bit vector in the destination field (or source field) is set to ‘1’ owing to wildcards.

Intrinsically, lots of these ‘1’ bits are redundant. The idea is that for each dimension j, regardless of the interval that a packet falls in, the packet always matches the rules with wildcards in dimension j. Thus there is no need to set corresponding ‘1’ bits in every interval for recording these wildcareded rules, and instead these rules are stored just once. Additional bit vectors, here called “Don’t Care Vectors” (DCV), are utilized to separate the wildcarded and non-wildcarded rules. A DCV is established for each dimension. Removing the redundant ‘1’ bits caused by wildcarded rules helps further reduce storage space. A DCV resembles a bit vector. Note that in a bit vector, bit j in the bit vector is set if the projection of the rule region corresponding to rule j overlaps with the related interval. In the DCV, bit j is set if the corresponding rule j is wildcarded, otherwise bit j is clear. For example, the last two bits of each bit vector in Fig. 3 could be removed and DCV “000000000011” added instead, which indicates that the 11^th and 12^th rules are wildcarded and others are not.

Using the above ideas, this study proposed an improved approach of bitmap intersection, called “bit compression”. Before describing the proposed bit compression scheme, this study presents some denotations and definitions.

Figure 5: The bitmap in dimension X of a 2-dimensional rule table which has two wildcarded rules R11 and R12 in dimension X.

For a k-dimensional rule table with N rules, let Ii,j denotes the i^th non-overlapping interval on dimension j and ORNi,j denotes the overlapped rule numbers for each interval Ii,j. Furthermore, BVi,j denotes the bit vectors associated with the interval Ii,j

and CBVi,j represents the corresponding compressed bit vector. Finally, DCVj is the

“Don’t Care Vector” for dimension j and DCVi,j is the i^th bit in DCVj.

Definition: For a k-dimensional rule table with N rules, “maximum overlap” for dimension j, denoted as MOPj, is defined as the maximum ORNi,j for all intervals in dimension j.

The preprocessing part of bit compression algorithm is as follows. For each dimension j, 1 ≤ j ≤ k,

1. Construct DCVj. For n from 1 to N, if RN is wildcarded on dimension j then DCVn,j

is set, otherwise DCVn,j is clear.

2. Calculate the value of MOPj and segment the entire range of dimension j into t CRs, CR1, CR2, …, CRt . The rules, where the rule projection overlaps with CRi, 1

≤ i ≤ t, form a rule set RSi, where the entry number of each rule set should be smaller than or equal to MOPj. (according to the subsequently described “region segmentation” algorithm)

3. For each CR CRi, 1 ≤ i ≤ t, construct a compressed bit vector and corresponding index list based on RSi. Then gather the index lists to compose an index table.

Furthermore, use listi to denote the index list related to CRi and listx,i to represent the x ^th entry of listi.

4. For each CR CRi, 1 ≤ i ≤ t, append “index table lookup address” (ITLA), which is the binary of (i-1), in front of each CBV. For convenient hardware processing, the number of bits of ITLA in each CR are all the same.

The classification steps of a packet are as follows. For each dimension j, 1 ≤ j ≤ k, 1. Find the interval Ii,j to which the packet belongs and obtain the corresponding

compressed bit vector CBVi,j and ITLA.

2. Use ITLA to look up the index table to obtain the corresponding index list, assume listm.

3. Read the DCVj into the final bit vector. Subsequently, read the index list found in step 2 entry by entry. If the x^th bit in CBVi,j is ‘1’, then access listx,m and set the corresponding bit in the final bit vector.

4. Take the conjunction of the final bit vector associated with each dimension and then determine the highest priority rule implied to the packet.

Figure 6 illustrates the bit compression algorithm. First, construct the DCV

“000000000011” for dimension X. As shown in Fig. 1, the dimension X is divided into 4 CRs. In CR1, thecorresponding rule set RS1 is {R1, R3, R4}, and thus the CBV in CR1 is constructed by considering R1, R3, R4 only and the index list list1 is [1, 3, 4]. An

ITLA “00” then is appended in front of the CBVs in CR1. As mentioned previously, additional ‘0’ bits are filled up in the CBVs and index table for the convenience of hardware implementation. Furthermore, similar steps are manipulated for CR2, CR3

and CR4.

Figure 6: An example of bit compression algorithm.

Consider an arriving packet p shown in Fig. 6, which falls into interval X4. The ITLA “01” and CBV “11101” associated with X4 thus are accessed. The ITLA “01”

serves as the lookup address in the index table to access index list [1, 2, 5, 6, 8]. The bits of “11101” then are known to represent R1, R2, R5, R6 and R8 respectively. Read DCV “000000000011” and set the first, second, fifth and eighth bits to form the final bit vector. The final bit vector in dimension X is “110010010011”, the same as the bit vector of interval X4 produced by the bitmap intersection scheme in Fig. 5. Similar processes are operated to form the final bit vector of dimension Y. Take the conjunction of the final bit vectors in dimension X and Y, and then the matched highest priority rule is obtained.

3.2 Measurement of Maximum Overlap

Because the bit compression scheme does not vary the number of bit vectors, the storage space saving is influenced by the length of the CBV, shorter CBV length leads greater space saving. However, as mentioned previously, the length of the CBV is limited by the value of maximum overlap. Consequently, the space saving increases with decreasing value of the maximum overlap. Notably, the bit compression scheme

在文檔中利用位元壓縮法的快速封包分類演算法 (頁 15-0)