Thesis Organization - 應用設計空間探索於有限脈衝響應濾波器之硬體最佳化

Chapter 1 Introduction

1.3 Thesis Organization

The remainder of this thesis is organized as follows. In Chapter 2, we introduce the specification of the filter and the previous works. Chapter 3 explains the motivation of this work. In Chapter 4, the proposed method is demonstrated. The experimental results are shown in Chapter 5. Finally, Chapter 6 gives the conclusions and the future works.

Chapter 2 Background

2.1 The Specification of FIR Filter

In this thesis, we consider the linear phase FIR filter. The frequency response of a Type I linear phase FIR filter with N taps is written as

The frequency response equations of Type II, III and IV linear phase FIR filters are similar to this [14].

The frequency response of the filter can be classified into four types: low-pass, high-pass, band-pass and band-stop. For the sake of convenience, we just illustrate the low-pass filter in the following. Fig. 2 shows the specification of the low-pass filter. The parameters ωp, ωs, δp, δs are the end of the pass-band, the beginning of the stop-band, the maximum allowable pass-band ripple and the maximum allowable stop-band ripple, respectively. The specification means that the frequency response must be inside the region. Thus, it can be expressed as the formula in the following.

is the average pass-band gain.

|H(ω)|

1+δ

1–δ

ω

ω δ

Fig. 2 The specification of a low-pass filter.

2.2 Previous Works

In the FIR filter design, in order to efficiently minimize the number of adders, we need to go back to the preceding process, that is, we must take account of the cost when determining a set of coefficients which satisfies the specification. Some previous works solve this problem [4-6][9], and they are briefly described here.

In [4], the work uses linear programming to derive the boundary of all coefficients which can meet the specification and searches coefficients within the boundary. The search method is the B&B that finds a better solution by first generating a look-up table containing all the possible subexpressions for a given wordlength and a given maximum number of adders per coefficient. It just considers the individual cost of each coefficient when generating the look-up table, so it possibly loses the better solutions.

In [5], the work formulates the problem as a 0-1 integer linear programming to minimize the number of adders. The formulation comprehensively considers the subexpression of each coefficient, but it needs a large number of variables to decide which coefficients and subexpressions are used, so it is very time consuming.

In [9], the work proposes a local search method and uses a common-subexpression-based method to account for the sharable adders. The canonical signed digit (CSD) representation is used. Although the representation can represent the coefficient with a minimum number of non-zero bits, it does not guarantee having the fewer number of adders than other representations.

In [6], the work uses the B&B search method in the boundary for each coefficient and proposes a cost estimation to minimize the number of adders. The cost estimation simply computes the required number of adders for generating a new coefficient by adding or shifting the integers in the subexpression basis set which is dynamically expanded during the search process. The experimental results show that the total number of adders in FIR design is fewer than other previous works under the same filter specification.

It is apparent that the scheme of first identifying a boundary and then performing a B&B search is widely adopted in coefficient decision as shown in [4][6]. The boundary computation can reduce the search space because we just need to search within the boundary of each coefficient. The B&B search strategy can eliminate invalid searches based on the filter specification and the total adder cost. We also adopt this B&B search and the boundary computation in our algorithm.

Chapter 3 Motivation

3.1 Right-Shifter in MCM

In the previous works [4-6][9], after the wordlength (WL) is decided, the value of every coefficient must be an integer ranging from 2^WL to 2^WL1 since an MCM block consists of adders and left-shifters only. For example, if the given wordlength is 10-bit, the value of each coefficient must be an integer between 1024 and 1023. However, we can reverse the sign of coefficients by replacing the structural adders by subtractors, so the range of coefficients is actually -1024~1024. If we consider the right-shifter, we can select the non-integers as the coefficients in certain range. For example, assume that the input data is x and 2.5 is the coefficient which we select, that is, the operation 2.5 * x is needed to be computed. This constant multiplication can be computed as ( 5 * x ) >> 1. Applying the right-shift operation, we can select the non-integer 2.5 as the coefficient.

An example is given here. Assume the input data is x, and we select two coefficients, “3”

and “20”, for the given filter specification without right-shifter operation. The architecture is shown as Fig. 3(a), and it needs two adders. However, we can select the non-integer as the coefficient with the right-shifter operation. Assume that the coefficient “3” can be replaced by

“2.5” and the set of coefficients still satisfies the filter specification. In this case, an adder can be replaced by a shifter, and the modified architecture is shown as Fig. 3(b). One adder can be saved to reduce the cost.

In another similar case, assume that there is no integer which satisfies the specification when determining the second coefficient. The previous works will return “no solution” in this case. However, with the right-shift operation, we can select “2.5” if it can satisfy the

specification and thus a solution is available. In this way, applying right-shift can make it easier to find a feasible solution.

(a) (b)

Fig. 3 Example of cost reduction

3.2 Heuristic Pruning Condition

After applying right-shift operation, the search space of coefficient sets is expanded and thus requires more time to find an exact solution from it. Similarly, the search space grows exponentially to the wordlength, thus in the previous works the wordlength can only be set to a value such that the run time is acceptable.

In this thesis, we introduce a heuristic pruning condition during the B&B search to reduce the run time. This heuristic pruning is based on the ripple of the frequency response, which implies the quality of current coefficient set. If the ripple is too large during B&B search, it hardly can find a feasible solution.

Applying the heuristic pruning may miss the best solution when searching in the design space. However, the run time is greatly reduced and thus it allows us to expand the search space. We found that in most cases searching heuristically in a larger design space is more effective than finding an exact solution in a smaller design space. Thus we apply this heuristic pruning in our algorithm.

3.3 Lower Bound Analysis

In order to illustrate that the right-shifter is beneficial for cost reduction, we compare the lower bounds of the number of adders between the design with and without right-shifters. The method for computing the lower bound will be explained in Section 4.6, and we just show the results here. Table I lists ten filters and their lower bound with and without right-shifters. The ten filters are randomly generated and their numbers of taps are lower, because the runtime of computing the lower bound is very long, and the higher-tap filter leads to longer runtime.

Table I can show that the right-shifter is actually beneficial for cost reduction. However, the improvement is not large because the number of taps is small.

Table I The comparison for lower bound

Filter Tap LB without right-shifters LB with right-shifters

M1 24 31 28

M2 23 28 26

M3 22 25 24

M4 24 28 27

M5 24 26 26

M6 23 31 30

M7 23 22 22

M8 22 26 24

M9 24 28 26

M10 24 26 25

3.4 Problem Formulation

In this thesis, we address the problem of the linear phase FIR filter design based on the MCM architecture. We are given:

 the wordlength of coefficients

 the specification of FIR filter: _p, _s, _p and _s

Our goal is to generate a set of coefficients and minimize the total number of structural adders (SA) and multiplier block adders (MBA) for the FIR filter design under the given filter specification constraint.

Chapter 4 Our Proposed Method

In this chapter, we propose an algorithm to determine coefficients for a specified linear phase FIR filter design. The target of our algorithm is to minimize the number of adders as the FIR filter is implemented through MCM. Besides, our algorithm allows the use of right shift operations in the MCM block to further expand the design space. Our method efficiently uses the B&B search to find a set of coefficients which has lower cost and satisfies the specification. The method uses the lower bound of MCM problem to estimate the cost and applies a heuristic bound condition in the B&B search.

4.1 Search of The Solutions

To find the set of coefficients which satisfy the specification and require fewer adders, we use the B&B algorithm same as the previous work [6]. In this previous work, they determine the coefficients from the smallest coefficient to the largest coefficient, that is, h0, h₁, … , h_M in (2.1), because the larger coefficient can be composed of the smaller coefficients by adders and left-shifters. However, we use the right-shift operation, so we determine the coefficients in the reverse order. The reason is that the smaller coefficients can be derived by right-shifting of the larger coefficients. Fig. 4 shows the B&B tree of the 5-tap FIR filter. The coefficients are determined in the corresponding level, where each edge represents one decision of the coefficient, and each path represents one set of coefficients. For example, the Path 1 is a set of coefficients that contains h₀ = 2, h₁ = 4, and h₂ = 7. Of course, we have some pruning conditions to reduce the search time, and it is discussed in the following section.

… …

… … …

Level 0 : h2

Level 1 : h₁

Level 2 : h₀

Leaf : solution

2 3

Path 1

Pruning Lines

Fig. 4 The B&B tree of the Type I 5-tap FIR filter

4.2 Boundary Computation

Assume that the given wordlength of coefficients is WL, so the coefficients can be selected between – 2^WL and 2^WL. The search space of B&B is very large. In order to reduce the search space, it is needed to reduce the range of coefficients. In the FIR filter design, the set of coefficients is not unique for the same filter design specification, but we can compute the boundary for each coefficient according to the specification. The boundary means that the coefficients outside the boundary never satisfy the filter specification. By computing the boundary, we can reduce the search space and the runtime.

4.2.1 Linear Programming Formulation

To determine the boundary of the coefficient h , we formulate a linear programming _k (LP) model, and the formulation is written as

u are two constants which specify the lower bound and the upper bound of , respectively.

Using (4.1), we can derive the lower bound of h . To derive the upper bound of _k h , replace _k minimize by maximize in (4.1). Using this LP model, we can derive the boundary of each coefficient. Eventually, we use LP solver, named gurobi [15] to solve this LP problem.

4.2.2 The Selection of β

and β

4.3 Cost Function and Zero-Crossing-Coefficient

Fig. 1(b) shows an N-tap FIR filter architecture with MCM, and the adders can be classified into SAs and MBAs. The goal of our work is minimizing the total number of adders which include SAs and MBAs. Assume that the number of SAs and MBAs are N_SA and N_MBA, respectively. The cost function can be written as NSA + NMBA. The SAs are used to sum up the outputs of MCM, so N_SA is related to the number of coefficients, that is, N_SA = Tap – 1, where

Tap is the number of coefficients. However, if there is one coefficient whose value is zero, the output of corresponding multiplication must be zero no matter what the multiplicand is.

Therefore, the corresponding adders can be removed. Besides, the linear phase FIR coefficients are symmetric, so we can save two adders when one coefficient is fixed to zero.

Thus, the cost function can be further written as NMBA + Tap – 1 – 2*Nzero, where Nzero is the number of the coefficients which equal to zero. Then, the cost function can be simplified as NMBA – 2*Nzero, because Tap – 1 is constant.

In order to reduce the cost, N_zero should be as large as possible. Therefore, we determine the zero-crossing-coefficients (ZCC) at first, and then the B&B search will determine the remaining coefficients. The ZCC means that the boundary of the coefficient is crossing zero.

Moreover, the set of ZCCs which has more number of zeros are searched at first. For example, assume that the feasible boundaries of h , ₀ h and ₁ h include zero. At first, three ZCCs are ₂ fixed to zero, that is, { h , ₀ h , ₁ h }. Then, two ZCCs are fixed to zero, that is, { ₂ h , ₀ h } ₁ or { h , ₁ h } or { ₂ h , ₀ h }. Then, one ZCC is fixed to zero, that is, { ₂ h } or { ₀ h } or ₁ { h }. Finally, no coefficient is fixed to zero. ₂

4.4 Algorithm Flow

Fig. 5 shows the algorithm flow. First, the algorithm computes the boundary of each coefficient according to the given specification. After this step, we can just search the coefficients within the corresponding boundaries. Secondly, the algorithm uses the B&B search to determine the coefficients. An important characteristic of the B&B search is that finding a good solution as soon as possible will result in earlier bound, and can reduce the runtime. Therefore, we create an iteration loop above the B&B search such that we can fix the ZCCs to zero first. In this iteration loop, we first set Nzero as the number of ZCCs and fix Nzero

ZCCs to zero. Then we use the B&B search to determine the remaining coefficients. Making more ZCCs to zero can save more SAs. However, it may cause the B&B search fail to find a feasible solution. If failed, we will reduce the Nzero. This loop continues until all combination of ZCCs will be tried or a feasible solution is obtained.

The main stage of this algorithm is the B&B search. In this stage, the algorithm determines the remaining coefficients by the B&B search. This thesis proposes a B&B search strategy, and it is introduced in Section 4.5.

Specification

1. Feasible boundary computation 2. Find zero-crossing-coefficients (ZCC)

Nzero = #ZCC

Fix Nzero ZCCs to zero

B&B search:

Determine the remaining coefficients

Success?

Output the architecture of the filter

Nzero = Nzero - 1 Iteration loop

N Y

Fig. 5 The algorithm flow

4.5 Branch and Bound Search

After fixing the ZCCs to zero, we will determine the remaining coefficients by the B&B search. In this section, we introduce the B&B search strategy to make the solution exploration in that expanded design space more efficiently and effectively.

4.5.1 Decision Flow

Applying the B&B search method, we need to do the coefficient decision in each node on the B&B tree. Fig. 6 shows the coefficient decision flow. Assume that the coefficient h_k+1 is already determined, and the coefficient hk will be determined this time. If k is equal to -1, the program already reaches the leaves of the B&B tree, so a satisfied set of coefficients is found. Then, we can record the result and go back to fix hk+1 to another candidate. If k is not equal to -1, the program will execute the following steps. Step 1, determine the candidate set, denoted as C, containing some values within the boundary of the coefficient hk. Step 2, compute LB and RIPPLE for each candidate, which are used to determine the priority of search. Step 3, fix hk to some value which belongs to C, and the priority is by ascending LB.

When LBs are equivalent, the priority is by ascending RIPPLE. Step 4, check the pruning conditions. The path is pruned when matching the pruning conditions. The Step 5, if the pruning conditions are all not matched, the program goes to the decision of h_k-1. Else, go back to Step 4 to fix hk to another candidate until try all candidates which belongs to C. When all candidates have been tried, if k does not equal to M, the program will go back to fix h_k+1 to another candidate. If k equals to M, which means that the whole branch tree has been searched, the program is finished.

In this section, we explain about Step 2 in Fig. 6. The goal of Step 2 is determining the candidate set of the coefficient h_k. In Step 4, h_k will be fixed to each value in the candidate set in the certain order which was introduced in Section 4.5.1.

We will select values within the boundary of h_k as the candidates, because the values outside the boundary never satisfy the specification. However, the actual boundary of

coefficient is much tighter than the initial boundary when more and more coefficients are fixed. Thus, if we want to derive the actual boundary, we must recompute the boundary by running the LP solver. In [6], a method is proposed to search the values without unnecessary LP runs, and we also adopt this method.

In order to reduce the number of running LP solver, we do not recompute the feasible boundaries of coefficients but compute them in the beginning just once when no coefficient is fixed. That is to say, use the LP model as (4.1) to compute the initial feasible boundaries of coefficients. The actual boundary of coefficient is much tighter than the initial boundary, so it is necessary to check whether a set of coefficients is satisfied. This problem can be solve by using a LP model as no feasible solution satisfying the specification is available. Applying this LP model, we can check the satisfaction and avoid the unnecessary LP runs in the candidates selection.

The candidates of a coefficient consist of two types. The first type candidates are integer values within the coefficient boundary as in previous works. The second type candidates are non-integer values which can be derived from the former determined coefficients by the right-shift operation. Note that the right-shift operation is just applied at the output of the MCM block, because we derive non-integer values by right-shifting the existent coefficients.

The right-shift operation may result in truncation error because of the non-integer property.

Extra fractional bits are required if no truncation error allowed. However, in the FIR filter design, the right-shift operation may be feasible, because the architecture of the FIR filter needs a series of adders to sum up the outputs of MCM that is shown as the SA of Fig. 1(b).

The series of adders usually lead to the truncation error because the sizes of adders are not increased stage by stage in order to reduce hardware cost. Moreover, in fixed-point arithmetic, keeping all less-significant-bits after a multiplication is not necessary because of the quantization error already existed in input signals. Thus according to the output error requirement, a truncation procedure is often required to reduce area as shown in [13]. If such procedure is applied, the truncation error problem implied by right-shift operation can be tolerated or considered on the quantization problem of the FIR filter design.

The pseudo code of candidates selection (CS) is as follows.

CS ( ub_k, lb_k, FC, x )

Note that ubk, lbk are the initial upper bound and the initial lower bound of hk, respectively. And FC is the fixed coefficient set containing the coefficients which are already

在文檔中應用設計空間探索於有限脈衝響應濾波器之硬體最佳化 (頁 13-0)