NXNext State - 具多字元狀態轉移之高效字串比對引擎

OP

Figure 3.5: Implementation of AC-DFA with TCAM and SRAM

Since a TCAM only outputs indexes according to the matching results, an ad-ditional SRAM is required for obtaining next states from the resulting indexes in the TCAM-based approach. This thesis develops a hardware architecture, shown in Fig. 3.6, that can achieve the feature as integrating the TCAM and SRAM to implement the AC-DFA. This proposed architecture consists of multiple rule units and these rule units can be configured according to the keyword set to be processed, in which each rule unit processes a transition respectively.

Two transitions beginning from the initial state are δ

₁

(0,e)=1 and δ

₁

(0,h)=8. In order to be distinguishable with a multi-character transition which will be described

Chapter 3. AC-Algorithm and Implementations

IN_CHR

NX PMUX

Dm D1

. . . .

? e 1

h 8

? 6 y

7 g 15 7

6 e 13

. . .

CUR_ST

Figure 3.6: Implementation of AC-DFA with priority multiplexer

later, the notation δ

₁

with a subscript 1 means a 1-character transition. In this figure, the initial state is replaced by a wildcard character ‘?’, and thus the two transitions of the initial state are replaced by δ

1

(?,e)=1 and δ

1

(?,h)=8. In this way, the transitions linked to states 1 and 8 derived from the failure functions can be omitted, such as δ

₁

(1,e)=1, δ

₁

(1,h)=8, and so on. Because of the wildcard character, multiple rules may be activated simultaneously. For example, when the current state is 6 and the input character is ‘e’, the transitions δ

₁

(6,e)=13 and δ

₁

(?,e)=1 are activated simultaneously.

In a typical TCAM, the conflict situation caused by multiple potential results is resolved by using a priority encoder to determine a final matching index. In this work, the conflict situation arising from multiple activated rules is resolved by using a priority multiplexer to determine a final matching result. In Fig. 3.6, the priority multiplexer PMUX has m inputs D1 through Dm. If the inputs Di and Dj are both valid and i < j, then the priority of Di is higher than that of Dj; and thus, the priority multiplexer selects the input Di to be output through MO. For PMUX, input D1 has the highest priority, and input Dm has the lowest priority. When none of the inputs is valid, PMUX outputs a default value, such as 0; this ensures that the next state should return to the initial state when none of rules is matched. In Fig. 3.6, the upper rule unit has a higher priority. As a result, when the conflict situation that δ

₁

(6,e)=13 and δ

₁

(?,e)=1 are activated simultaneously is happen, the priority multiplexer PMUX determines the next state is 13.

3.3. AC-NFA

3.3 AC-NFA

If the failure links are removed and simultaneously activation of multiple states is allowed, an AC-trie becomes an NFA. Fig. 3.7 illustrates the AC-NFA obtained from the original AC-trie by removing the failure links. After converting an AC-trie to an AC-NFA, all matched transitions are done currently. The parallelism implicit in hardware makes it more feasible to keep track of concurrent state transitions. The transitions of an AC-NFA are only the goto functions of the original AC-trie. In the proposed approach, the complexity in terms of number of transitions remains the same, whereas the failure functions are transformed into the concurrent transitions that fit the hardware intrinsically.

Fig. 3.2(c) illustrates the matching example of the AC-NFA with the same input

‘enhappenhappygo’. In response to the characters ‘enhapp’, a transition sequence begins from state 0 and ends at state 6, which does not match any keyword. In response the characters beginning from the third character, another transition se-quence traverses states 0, 8, 9, 10, 11, 13, and 14, which matches the keyword

‘happen’. The characters beginning from the seventh character initialize a transi-tion sequence traverses states 0 to 7 and matches the keyword ‘enhappy’. Similarly, the characters beginning from the ninth character initialize a transition sequence traverses states 0, 8, 9, 10, 11, 12, 15, and 16, which matches the keyword ‘hap-pygo’. As can be seen by comparing Fig. 3.2(a) and (c), in every matching cycle, when a state is activated in the AC-trie, all the states linked to through the failure functions from the active state are activated simultaneously in the AC-NFA. For example, when state 3 is activated, state 8, which is pointed by the failure func-tion of state 3, is activated simultaneously. Therefore the failure funcfunc-tions are not necessary if multiple states are allowed to be activated simultaneously.

0 1 2 3 4

8 a 9

h

e h a

p 5

p 6 y

10 p 11

p y g o

e 13 14

15 16

7 n

n 12

(enhappy happy)

(happygo) (happy)

(happen)

Figure 3.7: AC-NFA

Fig. 3.8 illustrates the implementation of the AC-NFA, where some similar

por-Chapter 3. AC-Algorithm and Implementations

Figure 3.8: Implementation of AC-NFA

tions are omitted in the circuit for clarity. The AC-algorithm provides only one matching output in every matching cycle, while multiple states may be activated simultaneously in an AC-NFA. Therefore, an output circuit is required to determine a matching output in the implementation of an AC-NFA. In this implementation, the final matching output OP is determined by using a priority multiplexer PMUX.

Notation st(i) denotes the status of node i, where i represents the state number in the AC-NFA. For example, when the text under inspection is ‘enhappy’, states 7 and 12 are activated simultaneously. State 7 represents the string ‘enhappy’ and state 12 represents the string ‘happy’; the former includes the later. Therefore, state 7 has a higher priority and is determined as the matching output.

In an AC-trie, each state represents a unique string. If a failure function links state S

1

to state S

2

, then the string represented by S

2

is the postfix of the string represented by S

₁

. For example, state 12 represents the string ‘happy’, and state 7 represents the string ‘enhappy’; in addition, the failure function of state 7 points to state 12, and the string ‘happy’ is the postfix of the string ‘enhappy’. In an AC-trie, the matching output is simply the active state since only one state is activated at any time. Although activation of multiple states is allowed in an AC-NFA, the proper matching output from the multiple active states must be determined. For example, like the earlier case, states 7 and 12 are activated simultaneously. Since failure functions link higher states to lower-level states, the highest-level activated state in an AC-NFA should determine the final matching output.

3.3. AC-NFA

In the matching operation of the AC-algorithm, only a matching output is gen-erated after every matching cycle. In the proposed NFA approach, the priority multiplexer PMUX shown in the lower right portion of Fig. 3.8 is used as an output selection circuit to determine the final matching outputs from multiple activated output nodes. Notably, the priority multiplexer PMUX in this figure differs from that in Fig. 3.6 in that each input group consists a control signal and a data sig-nal; nevertheless, they are the same essentially and the difference in diagrams is convenient for explanation only.

This AC-NFA has four output nodes so that the priority multiplexer PMUX has four input groups (E1, D1) through (E4, D4), where inputs E1 through E4 are control signals and inputs D1 through D4 are data signals. The control signals E1 through E4 indicate whether the data inputs D1 through D4 are valid or not, respectively. If the inputs Ei and Ej are both true and i < j, then the priority of input Ei is higher than that of input Ej; in addition, the priority multiplexer selects the input Di to be output through MO. When none of the inputs is valid, the output MO is not valid either. However, the output MO of PMUX can output 0 if no matched output is available. Notation st(s) refers to the status of node s, which is true when node s is activated. Since higher-level nodes have a higher priority, signal st(9) has the highest priority and is connected to E1. The data sent to inputs D1 through D4 are the corresponding state numbers. Consider the previous example.

Following acceptance of the text ‘enhappy’, both nodes 7 and 12 are activated and both st(7) and st(12) become true. Moreover, since the priority of st(7) is higher than that of st(12), PMUX selects the data 7 input from D1 as the matching output.

0 e 1

Stage 1 Stage 2 Stage 6 Stage 7

Figure 3.9: Multi-stage architecture of AC-NFA

According to the observation, among the states with the same depth in an NFA derived from a given AC-trie, the number of active states n

_a

is no more than one, i.e.

_a

≤ 1. Therefore, an AC-NFA can be implemented in a multi-stage architecture

Chapter 3. AC-Algorithm and Implementations

alternatively. In an AC-trie, the states with the same depth are known as in the same level. The states with the same depth represent different strings with the same length, and thus at most one state in a level can be activated in one time.

Consequently, only one register is required to save the active state for each level in an AC-NFA. Fig. 3.9 illustrates a multi-stage architecture for implementing the AC-NFA. The multiple stages are arranged in a chain, in which each stage includes the transitions of the corresponding level. Fig. 3.10 illustrate the block diagram of a stage unit. A stage unit includes multiple rule units, in which each rule unit is responsible for matching one transition. Therefore, the number of rule units in a stage must be equal or greater than the number of transitions. A rule unit contains the information of its corresponding transition and matches the information with the input current state CUR ST and character IN CHAR in the matching operation. A rule unit is triggered when its pattern is matched with the inputs and then outputs the next state NX according to the transition.

IN_CHR

NX IN_CHR

CUR_ST RULE[1]

RULE[m]

NX IN_CHR

CUR_ST

. . .

MUX

MO D[1]

D[m]

. . .

Figure 3.10: Block diagram of a stage unit

3.4 Hybrid AC-FA

The DFA approach has an attractive property that processing an input string in-volves one DFA state traversal per character, which implies a deterministic number of memory accesses. Namely, the memory bandwidth requirement for implement-ing a DFA is predictable and it is possible to implement a DFA in a lookup table approach. However, the number of transitions grows explosively when an AC-trie is converted to a DFA. In contrast, an NFA approach is efficient in the hardware size utilization. However, an NFA is difficult to be implemented in a pre-designed architecture, like a lookup table approach, since each input character can trigger multiple state transitions and multiple states can be active simultaneously. Gener-ally, an NFA is suitable to be implemented in a programmable device.

3.4. Hybrid AC-FA

NFA portion DFA portion

Figure 3.11: Hybrid AC-FA

According to the analysis result in Table 3.1, the transitions increase dramatically in the lower levels due to the failure functions when an AC-trie is converted to a DFA. Accordingly, this work proposes a hybrid finite automaton to implement an AC-trie (Hybrid AC-FA), which has both the advantages of DFA and NFA. Fig. 3.11 shows a hybrid AC-FA that is obtained by dividing the AC-trie in Fig. 3.1 to NFA and DFA portions. In the NFA-portion, all failure functions are removed and only goto functions are remained; while in the DFA-portion, the failure functions are substituted by expanding goto functions. For the convenience of discussion, let the NFA levels are defined as the number of levels in NFA portion, in which level 0 is excluded. For example, there are three NFA levels in the hybrid AC-FA illustrated in Fig. 3.11. Notably, states 4 and 11 are in the DFA portion.

Comparing with the AC-DFA illustrated in Fig. 3.3, the transitions linked back-wardly to states 1, 3, and 8 are eliminated in the hybrid AC-FA. In which, most of the removed transitions are linked backwardly to the states in levels 0 and 1, i.e.

states 0, 1 and 8. Correspondingly, the hybrid AC-FA only has two more transitions, linked to states 13 and 15, respectively, than the AC-NFA illustrated in Fig. 3.7.

Like the multi-stage architecture for implementing the AC-NFA, only one register is required for saving the state for each level in the NFA portion. Furthermore, because at most one state is allowed to be activated in the DFA portion, only one register is required for keeping the state in the DFA portion. As a result, the stages and the registers required for keeping the states in the implementation of hybrid AC-FA can be predetermined, and this enables to design a general string-matching architecture based on the AC-algorithm. Because the transitions increase little as comparing with the AC-NFA approach, the hybrid AC-FA approach is efficient in space utilization.

Fig. 3.12 shows the multi-stage architecture for implementing the hybrid AC-FA.

Chapter 3. AC-Algorithm and Implementations

Figure 3.12: Implementation of hybrid AC-FA

Stages 1 to 4, which belong to the NFA portion, contain the transitions of levels 0 to 3 respectively. Stage 5, which is the terminal stage and belongs to the DFA portion, contains all of the transitions. Because only one state can be activated at most in each stage, priority multiplexer PMUX0 determines the next state for the terminal stage from the next states (NX) output from stage 4 and stage 5. The detail of each stage is same as the block diagram shown in Fig. 3.10. The next state (NX) generated by the i-th stage also represents the matching result xopi of that stage.

PMUX1 determines the final matching output from the matching results xop1 to xop5.

Figure 3.13: Priority multiplexer implemented by chained multiplexers

3.5. Priority Multiplexer

3.5 Priority Multiplexer

The priority multiplexer plays an important role in the proposed architectures and its implementation is described briefly here. The implementation of priority multiplexer mainly refers to the literature of Alera [27]. In a priority multiplexer, the select logic implies a priority, so the options to select the correct item must be checked in order.

Fig. 3.13 illustrates a 4-to-1 priority multiplexer implemented by multiple chained multiplexers that evaluates each condition, or select bit, one at a time. However, this structure of chained multiplexers is generally bad for delay, since the critical path through the logic traverses through every multiplexer in the chain. As a result, the delay of the structure of chained multiplexers increases linearly respect with the number of data inputs.

1

0 1 0

D4 MO

1 0

E3 E4

E2 E1

MUX1

MUX2

MUX3

MUX4

Figure 3.14: Priority multiplexer optimized for delay

Fig. 3.14 illustrates an alternate implementation of priority multiplexers to op-timize the delay. This logic structure is just slightly more complicated than the standard priority multiplexer scheme, but significantly improves the delay through the logic. In this structure, if any of the two select lines E1 and E2 are high, then the 2-input AND gate chooses the upper half of the logic, otherwise it chooses the lower side. The enable signals E1 through E4 make the final choice of inputs, if all of the enable signals are low then the output MO is zero. The signal E1 has the highest priority in the figure. The levels of the chained multiplexers is log

₂

m for m data inputs in the optimized structure. Therefore the delay of the optimized priority multiplexer reduces from O(m) to O(log

₂

m) for m data inputs.

Chapter 4

在文檔中具多字元狀態轉移之高效字串比對引擎 (頁 31-41)