Problem Formulation - 應用設計空間探索於有限脈衝響應濾波器之硬體最佳化

Chapter 3 Motivation

3.4 Problem Formulation

In this thesis, we address the problem of the linear phase FIR filter design based on the MCM architecture. We are given:

 the wordlength of coefficients

 the specification of FIR filter: _p, _s, _p and _s

Our goal is to generate a set of coefficients and minimize the total number of structural adders (SA) and multiplier block adders (MBA) for the FIR filter design under the given filter specification constraint.

Chapter 4 Our Proposed Method

In this chapter, we propose an algorithm to determine coefficients for a specified linear phase FIR filter design. The target of our algorithm is to minimize the number of adders as the FIR filter is implemented through MCM. Besides, our algorithm allows the use of right shift operations in the MCM block to further expand the design space. Our method efficiently uses the B&B search to find a set of coefficients which has lower cost and satisfies the specification. The method uses the lower bound of MCM problem to estimate the cost and applies a heuristic bound condition in the B&B search.

4.1 Search of The Solutions

To find the set of coefficients which satisfy the specification and require fewer adders, we use the B&B algorithm same as the previous work [6]. In this previous work, they determine the coefficients from the smallest coefficient to the largest coefficient, that is, h0, h₁, … , h_M in (2.1), because the larger coefficient can be composed of the smaller coefficients by adders and left-shifters. However, we use the right-shift operation, so we determine the coefficients in the reverse order. The reason is that the smaller coefficients can be derived by right-shifting of the larger coefficients. Fig. 4 shows the B&B tree of the 5-tap FIR filter. The coefficients are determined in the corresponding level, where each edge represents one decision of the coefficient, and each path represents one set of coefficients. For example, the Path 1 is a set of coefficients that contains h₀ = 2, h₁ = 4, and h₂ = 7. Of course, we have some pruning conditions to reduce the search time, and it is discussed in the following section.

… …

… … …

Level 0 : h2

Level 1 : h₁

Level 2 : h₀

Leaf : solution

2 3

Path 1

Pruning Lines

Fig. 4 The B&B tree of the Type I 5-tap FIR filter

4.2 Boundary Computation

Assume that the given wordlength of coefficients is WL, so the coefficients can be selected between – 2^WL and 2^WL. The search space of B&B is very large. In order to reduce the search space, it is needed to reduce the range of coefficients. In the FIR filter design, the set of coefficients is not unique for the same filter design specification, but we can compute the boundary for each coefficient according to the specification. The boundary means that the coefficients outside the boundary never satisfy the filter specification. By computing the boundary, we can reduce the search space and the runtime.

4.2.1 Linear Programming Formulation

To determine the boundary of the coefficient h , we formulate a linear programming _k (LP) model, and the formulation is written as

u are two constants which specify the lower bound and the upper bound of , respectively.

Using (4.1), we can derive the lower bound of h . To derive the upper bound of _k h , replace _k minimize by maximize in (4.1). Using this LP model, we can derive the boundary of each coefficient. Eventually, we use LP solver, named gurobi [15] to solve this LP problem.

4.2.2 The Selection of β

and β

4.3 Cost Function and Zero-Crossing-Coefficient

Fig. 1(b) shows an N-tap FIR filter architecture with MCM, and the adders can be classified into SAs and MBAs. The goal of our work is minimizing the total number of adders which include SAs and MBAs. Assume that the number of SAs and MBAs are N_SA and N_MBA, respectively. The cost function can be written as NSA + NMBA. The SAs are used to sum up the outputs of MCM, so N_SA is related to the number of coefficients, that is, N_SA = Tap – 1, where

Tap is the number of coefficients. However, if there is one coefficient whose value is zero, the output of corresponding multiplication must be zero no matter what the multiplicand is.

Therefore, the corresponding adders can be removed. Besides, the linear phase FIR coefficients are symmetric, so we can save two adders when one coefficient is fixed to zero.

Thus, the cost function can be further written as NMBA + Tap – 1 – 2*Nzero, where Nzero is the number of the coefficients which equal to zero. Then, the cost function can be simplified as NMBA – 2*Nzero, because Tap – 1 is constant.

In order to reduce the cost, N_zero should be as large as possible. Therefore, we determine the zero-crossing-coefficients (ZCC) at first, and then the B&B search will determine the remaining coefficients. The ZCC means that the boundary of the coefficient is crossing zero.

Moreover, the set of ZCCs which has more number of zeros are searched at first. For example, assume that the feasible boundaries of h , ₀ h and ₁ h include zero. At first, three ZCCs are ₂ fixed to zero, that is, { h , ₀ h , ₁ h }. Then, two ZCCs are fixed to zero, that is, { ₂ h , ₀ h } ₁ or { h , ₁ h } or { ₂ h , ₀ h }. Then, one ZCC is fixed to zero, that is, { ₂ h } or { ₀ h } or ₁ { h }. Finally, no coefficient is fixed to zero. ₂

4.4 Algorithm Flow

Fig. 5 shows the algorithm flow. First, the algorithm computes the boundary of each coefficient according to the given specification. After this step, we can just search the coefficients within the corresponding boundaries. Secondly, the algorithm uses the B&B search to determine the coefficients. An important characteristic of the B&B search is that finding a good solution as soon as possible will result in earlier bound, and can reduce the runtime. Therefore, we create an iteration loop above the B&B search such that we can fix the ZCCs to zero first. In this iteration loop, we first set Nzero as the number of ZCCs and fix Nzero

ZCCs to zero. Then we use the B&B search to determine the remaining coefficients. Making more ZCCs to zero can save more SAs. However, it may cause the B&B search fail to find a feasible solution. If failed, we will reduce the Nzero. This loop continues until all combination of ZCCs will be tried or a feasible solution is obtained.

The main stage of this algorithm is the B&B search. In this stage, the algorithm determines the remaining coefficients by the B&B search. This thesis proposes a B&B search strategy, and it is introduced in Section 4.5.

Specification

1. Feasible boundary computation 2. Find zero-crossing-coefficients (ZCC)

Nzero = #ZCC

Fix Nzero ZCCs to zero

B&B search:

Determine the remaining coefficients

Success?

Output the architecture of the filter

Nzero = Nzero - 1 Iteration loop

N Y

Fig. 5 The algorithm flow

4.5 Branch and Bound Search

After fixing the ZCCs to zero, we will determine the remaining coefficients by the B&B search. In this section, we introduce the B&B search strategy to make the solution exploration in that expanded design space more efficiently and effectively.

4.5.1 Decision Flow

Applying the B&B search method, we need to do the coefficient decision in each node on the B&B tree. Fig. 6 shows the coefficient decision flow. Assume that the coefficient h_k+1 is already determined, and the coefficient hk will be determined this time. If k is equal to -1, the program already reaches the leaves of the B&B tree, so a satisfied set of coefficients is found. Then, we can record the result and go back to fix hk+1 to another candidate. If k is not equal to -1, the program will execute the following steps. Step 1, determine the candidate set, denoted as C, containing some values within the boundary of the coefficient hk. Step 2, compute LB and RIPPLE for each candidate, which are used to determine the priority of search. Step 3, fix hk to some value which belongs to C, and the priority is by ascending LB.

When LBs are equivalent, the priority is by ascending RIPPLE. Step 4, check the pruning conditions. The path is pruned when matching the pruning conditions. The Step 5, if the pruning conditions are all not matched, the program goes to the decision of h_k-1. Else, go back to Step 4 to fix hk to another candidate until try all candidates which belongs to C. When all candidates have been tried, if k does not equal to M, the program will go back to fix h_k+1 to another candidate. If k equals to M, which means that the whole branch tree has been searched, the program is finished.

In this section, we explain about Step 2 in Fig. 6. The goal of Step 2 is determining the candidate set of the coefficient h_k. In Step 4, h_k will be fixed to each value in the candidate set in the certain order which was introduced in Section 4.5.1.

We will select values within the boundary of h_k as the candidates, because the values outside the boundary never satisfy the specification. However, the actual boundary of

coefficient is much tighter than the initial boundary when more and more coefficients are fixed. Thus, if we want to derive the actual boundary, we must recompute the boundary by running the LP solver. In [6], a method is proposed to search the values without unnecessary LP runs, and we also adopt this method.

In order to reduce the number of running LP solver, we do not recompute the feasible boundaries of coefficients but compute them in the beginning just once when no coefficient is fixed. That is to say, use the LP model as (4.1) to compute the initial feasible boundaries of coefficients. The actual boundary of coefficient is much tighter than the initial boundary, so it is necessary to check whether a set of coefficients is satisfied. This problem can be solve by using a LP model as no feasible solution satisfying the specification is available. Applying this LP model, we can check the satisfaction and avoid the unnecessary LP runs in the candidates selection.

The candidates of a coefficient consist of two types. The first type candidates are integer values within the coefficient boundary as in previous works. The second type candidates are non-integer values which can be derived from the former determined coefficients by the right-shift operation. Note that the right-shift operation is just applied at the output of the MCM block, because we derive non-integer values by right-shifting the existent coefficients.

The right-shift operation may result in truncation error because of the non-integer property.

Extra fractional bits are required if no truncation error allowed. However, in the FIR filter design, the right-shift operation may be feasible, because the architecture of the FIR filter needs a series of adders to sum up the outputs of MCM that is shown as the SA of Fig. 1(b).

The series of adders usually lead to the truncation error because the sizes of adders are not increased stage by stage in order to reduce hardware cost. Moreover, in fixed-point arithmetic, keeping all less-significant-bits after a multiplication is not necessary because of the quantization error already existed in input signals. Thus according to the output error requirement, a truncation procedure is often required to reduce area as shown in [13]. If such procedure is applied, the truncation error problem implied by right-shift operation can be tolerated or considered on the quantization problem of the FIR filter design.

The pseudo code of candidates selection (CS) is as follows.

CS ( ub_k, lb_k, FC, x )

Note that ubk, lbk are the initial upper bound and the initial lower bound of hk, respectively. And FC is the fixed coefficient set containing the coefficients which are already fixed. The x is a value which must satisfy the specification, and it can be derived when the LP runs for h_k+1. In line 1 and line 2, the integral candidate set C₁ and non-integral candidate set C2 are empty initially. Two for loop in line 3 to line 14 add the integers which satisfy the specification to C₁. The first for loop in line 3 to line 8 searches the integers and checks the specification in one direction towards the initial lower bound or until unsatisfied; and the second for loop in line 9 to line 14 searches the integers and checks the specification in the other direction towards the initial upper bound or until unsatisfied. The third for loop in line 15 to line 21 searches the non-integers in the initial boundary. If the non-integer can satisfy the specification and be derived from the former determined coefficients by the right-shift operation, we add it to C₂. The last line 22 returns the union of C₁ and C₂ as the candidate set.

The search space is already very large even if the non-integer is not considered. In order to control the non-integral search space, we define a parameter L. We restrict the number of bits after binary point within L. Therefore, we can control the non-integral search space by modulating L. The experimental results show the performances with different L in Section 5.1.

4.5.3 LB and RIPPLE Computation

In this section, we explain the Step 2 in Fig. 6. The search space of B&B is very large. In order to speed up the search process, it is better to find a set of coefficients which satisfies the specification and has the low cost as soon as possible, so that the search could be early bounded. For this reason, we define two variables for each candidate in the candidate set, and we determine the search priority according to the two variables. The first one is LB, and it estimates the number of adders which is needed when fixing h to some candidate. The _k second one is RIPPLE, and it represents the quality of a candidate.

In order to reduce the time of cost computation, we use the lower bound of MCM design to estimate the number of adders. In [8], the lower bound of the number of adders for MCM design is proposed as where C_i are positive odd unique coefficients, N is the number of coefficients, and S(C_i) is the minimal number of non-zero bits of Ci. We compute the LB for each candidate in the candidate set by (4.4), and the pseudo code is as follows.

LB_Compute(CS, FC)

Note that, CS is the candidate set, and FC is the fixed coefficient set. In line 1 to line 8, the program computes the lower bound for each candidate which is in the candidate set. In line 5 and line 6, the program removes the candidate whose LB are not smaller than BEST_LB, because it is very possible that the candidates result in worse cost than the best solution. In order to reduce the search space, we remove them from the candidate set. The BEST_LB is the minimal lower bound among the solutions which were already found.

In the candidate selection, we solve the LP model (4.3) to check the satisfaction, and the pass-band ripple δ actually can represent the quality of a set of coefficients. Because the smaller ripple means that it is more flexible to select the unfixed coefficients under the specification constraint. Therefore, we compute the RIPPLE for each candidate in the

candidate set by deriving the pass-band ripple in (4.3). Because the  may be different for each candidate, we need to normalize the pass-band ripple as

RIPPLE  (4.5).

The RIPPLE for each candidate can be recorded when the LP model is solved in the candidate selection, so no additional computation is needed.

4.5.4 Pruning Conditions

In this section, we introduce the pruning conditions of the B&B search. The pruning conditions are classified to two types: deterministic condition and heuristic condition. The deterministic condition means that it does not obstruct the obtainment of the best solution in the B&B search. On the contrary, the heuristic condition may lead to the loss of the performance, but it can reduce the search time. In our method, there are one deterministic condition: specification pruning condition, and two heuristic conditions: LB pruning condition and ripple pruning condition.

The specification pruning condition means that the set of fixed coefficients is unsatisfied even if we have not fixed all other coefficients yet. Since it is unsatisfied, this path is pruned.

In fact, this condition check was done in the candidate selection, because we only add the values which satisfy the specification to the candidate set.

The LB pruning condition means that the lower bound of the number of adders for the fixed coefficients is not smaller than the minimal lower bound among the solutions which were already found. Actually, this condition check was done in the LB computation.

The ripple pruning condition means that the RIPPLE of the candidate is larger than RIPPLE_Threshold which is a dynamic value for each coefficient. We estimate that there is no satisfied solution after fixing a coefficient to a candidate which matches the ripple condition. The pseudo code of the B&B search is as follow, and it contains the ripple pruning.

25 infinity. Line 1 to line 3 is the terminal condition. Line 4 is the candidates selection, and line 5 is the LB and RIPPLE computation. Line 7 to line 13 fixes the coefficient hk to each candidate and dynamically changes RIPPLE_Threshold. When determining a coefficient h_k, if a candidate of hk failed to find any feasible solution, its RIPPLE will be recorded as the RIPPLE_Threshold of h_k. Later in the search, when the RIPPLE of any candidate of h_k is larger than the RIPPLE_Threshold of hk, that branch will be pruned heuristically. The reason is that we estimate that there may be no satisfied solution if the RIPPLE of a candidate is heuristic pruning conditions in our method.

4.5.5 Solution Record

When k = -1 in Fig. 6, it means that the program has reached the leaves of the B&B tree.

Therefore, we have a satisfied set of coefficients, and we can compute the real number of adders for this coefficient set. In the Section 4.3, we already explain that the cost function is N_MBA – 2*N_zero. In this step, we use Hcub [1], which is a graph-based MCM algorithm, to compute NMBA for this set of coefficients. The best solution of coefficient sets is kept until the B&B tree is thoroughly searched and the filter architecture is obtained. The flow of solution record is shown in Fig. 7.

LB <= BEST_LB ?

BEST_LB = LB Compute NMBA by Hcub

NMBA <

BEST_NMBA ?

BEST_NMBA = NMBA

Record the set of coefficients N

Fig. 7 The flow of solution record

Fig. 8 The example of B&B search

4.5.6 The Example of B&B Search

In Fig. 8, we present an example showing how the B&B search works. This is a 5-tap linear phase filter, with 3 coefficients h₂, h₁ and h₀. The h₁ is a ZCC and we decide to fix it to zero in the iteration loop. Thus in the B&B search, h1 will be locked to zero and only h2 and h₀ will be determined.

Firstly, we consider h2 because it is the largest coefficient. This is the first coefficient and only integer candidates are available. According to its boundary, the three available candidates are “9”, “10” and “11”. Then we will compute the LB and RIPPLE of them, the values are shown in Fig. 8. Note that the RIPPLE is computed with h₁ being fixed to zero. Then we will

在文檔中應用設計空間探索於有限脈衝響應濾波器之硬體最佳化 (頁 20-0)