Chapter 1 Introduction
1.3 Organization of This Thesis
The organization of this thesis is introduced as below:
Chapter2: We would introduce the useful properties for designing the 1-D linear-phase FIR digital filters. We use the least-square approach and the iterative Lagrange-multiplier approach to design the continuous coefficients filter and the finite word length discrete coefficients respectively, and both of them are able to get powers-of-two coefficients.
Chapter3: We would establish the transposed structure of 1-D linear-phase FIR digital filters, which have multiplier and multiplierless architecture to implement it. We would also use Verilog-XL simulator in the workstation to verify the timing, and as the gate-level simulations are finished, the implementation by using FPGA would be accomplished, and the comparison of their specification would also be made.
Chapter4: We would introduce the two-channel perfect reconstruction quadrature mirror image filter banks and describe how we analyze and synthesize the signals.
Finally, we would also use polyphase architecture to implement it by using FPGA and compare their specifications with the proposed approach in Chapter 2.
Cahpter5: We would use the least-square approach in 1-D system extend for designing the
2-D quadrantal symmetric FIR filters, and then 16 different types of cases would be listed.
Chapter6: The conclusions and some future work would be contained in this thesis.
Chapter 2
Design of 1-D Linear-Phase FIR Digital Filters
2.1 Introduction
Digital filter plays a very important role in the digital signal processing. A FIR (finite impulse response) or the non-recursive system is that the impulse response possesses only a finite number of nonzero samples. For such a filter, the impulse response is always absolutely summable, and these FIR filters would always be stable.
The operation of FIR is simply a convolution sum, and in the time-domain, the input-output relation of the above FIR filter could be given by
( ) 1 ( ) ( )
0
k n x k h n
y
N
k
−
=
∑
−= (2.1)
The )y(n and x(n) are respectively the output and input sequences, N is the total tap length of filter, and h(n) is the coefficient of the filter. Since FIR filters can be designed to provide the exact linear phase over the whole frequency range, and they are always BIBO stably independent to the filter coefficients, there might be some advantages for these filters:
(1) The problems of the stability would not appear in this system since the FIR structure is always stable.
(2) The implantation of FIR is easy since there are only three types of components we have to design: multipliers, adders, and registers.
(3) It is easy to design FIR filters in a response system with linear-phase since a linear-phase system will have been fixed delay times for components of the different frequency input signals, and it will going to not distort the phase of signal.
2.1.1 The Structure of FIR Filters
There are two forms that have been discussed in many textbooks [1]. A direct form realization of an FIR filter can be developed from Eq. (2.1). This equation represents that the structure consisted of an interconnection of the basic modules of additions, multiplications, and registers. The direct form structure of FIR in the direct form is shown in Fig. 2.1.
Figure 2.1 Structure of FIR in direct form
For the structure of FIR in the direct form, its register bits are the same as input signal bits, but its throughput rate is not good yet. Due to its output is all the sum of product, as only as its numbers of the tap are many, which will cause more latency times. Although the pipeline technique might help with this problem, the area would therefore be increased.
direct form structure and is shown in Fig. 2.2.
Figure 2.2 Structure of FIR in transposed form
For the structure of FIR in transposed form, the value of each adder is the sum of the product for the moment and before one. So its latency time will not to be influenced by numbers of the tap, and it has a bigger throughput rate than direct form. For this structure, if its input bits are N and the bits of all absolutely coefficients sum are M, so the register width is M+N bits. Although the register bits of the transpose form are bigger than one, for implement of 1-D FIR register numbers are not so many. So we use this structure to implement is useful.
2.2 Design of Linear-Phase FIR Digital Filters
In this section, we would discuss how to design the linear-phase on FIR digital filters.
In many applications, it is necessary to ensure that the digital filter designed doesn’t distort the phase of the input signal components with frequencies in the pass-band. For a causal FIR filter of order N, the transfer function H(z) is
∑
−=
=
1 − 0) ( )
(
Nn
z
nn h z
H
(2.2)The )h(n is a causal finite duration sequence.
An Mth-order linear-phase FIR digital filter is either characterized by a symmetric or an anti-symmetric impulse response:
Symmetric:
h ( n ) = h ( M − n ) , 0 ≤ n ≤ M
(2.3)Anti-symmetric:
h ( n ) = − h ( M − n ) , 0 ≤ n ≤ M
(2.4)When the FIR filter has the property of the linear phase, some symmetric situations would be presented by the impulse response, and we could also reduce the numbers of multipliers from the above property.
According to the properties of the impulse response and the length of the filter (even or odd), there would be four cases we could consider.
Case1: Symmetric impulse response and odd length:
∑
Case2: Symmetric impulse response and even length:
∑
=Case3: Anti-symmetric impulse response and odd length:
Case4: Anti-symmetric impulse response and even length:
∑
=From the above four cases of FIR, we could easily obtain the coefficient and magnitude response of the filters. It summarizes the main features and applications in Table 2.1.
The symmetry and anti-symmetry property from Eq. (2.3) and (2.4) of a linear phase FIR filter can be exploited to reduce the total number of multipliers into almost half of that in the transposed form implementations of the transfer function, and the linear-phase FIR in transposed form structure is shown in Fig. 2.3.
Figure 2.3 Structure of linear-phase FIR in transposed form.
Table 2.1 Summarizes their main features.
Case 1 2 3 4
Length Odd Even Odd Even
Symmetry Symmetric Symmetric Anti-symmetric Anti-symmetric
H(0) Random Random 0 0
H(π) Random 0 0 Random
Low-Pass Band-Pass High-Pass
Low-Pass Band-Pass
Band-Pass High-Pass Band-Pass
Multirate-systems Application
Multirate-systems Hilbert-Transformer Differentiator
2.2.1 Least-Squares Approach to the Design of Case 1 Low-Pass Filters
We first are given the ideal magnitude response D(ω), and desired magnitude
response )H^ (ω . For designing Case 1 low-pass filter with pass-band [0,ωp], stop-band product form:
A
The t is the transpose operation. And substituting Eq. (2.14) in Eq. (2.13), we could get
QA
∫
By this time, we also assume that the desired response and the weighting function are
.
With these conditions, the elements of P and Q can easily be calculated.
2 ]
⎪⎪
According to the elements of P and Q, we can obtain the coefficient vector of filter A by making differential error function equal zero for A.
0
After A is obtained, the filter continuous coefficients h(n) can be calculated by Eq.
(2.6), Eq. (2.8), Eq. (2.10), and Eq. (2.12).
2.2.2 Lagrange-Multiplier Approach to the Design of Case 1 Low-Pass Filters with Constrained Conditions
In the fast real time system, the design of filters employing discrete coefficients is an important problem. In the past decades, most of papers were proposed for designing discrete coefficient FIR filters [2]-[10]. The Lagrange-Multiplier approach has been widely applied to design maximally flat real coefficients FIR digital filters and differentiators. In some applications, certain restrictive conditions should be incorporated in the design procedures, and the desired response D(ω) is approximated. For designing maximally flat filters, imposing magnitude attains the desired performance of the filter pass-band or derivative constrains F at a discrete set of points ω0,ω1,LLωJ in the pass-band. For example,
The M denotes the order of constraints at a particular frequency ωj. We could rewrite the Eq. (2.27) as below.
Hence, we do again obtain the desired filter that combines the integrated square error function with the constraint in Eq. (2.28).
Main condition:
e = s + P
tA + A
tQA
(2.31) Constraint:B
tA = G
(2.32)Then, we adopt the Lagrange-Multiplier approach and combine the two equations by adding Lagrange-Multiplier vector λ.
t MJ J
M
M ]
[λ00 λ 0 λ01 λ 1 λ0 λ
λ = L L L L (2.33)
After that, we can get the Lagrange function form the following matrix.
) (
) ,
( A = s + P t A + AtQA − t B t A − G
Λ λ λ (2.34)
In order to achieve the ideal state the optimality of the necessary and sufficient conditions are
= 0 Λ
∇ A (2.35) and
∇ λΛ = 0 (2.36)
which leads to
0
2 − =
+ QA B λ
P
(2.37) and= 0
− G A
B
t (2.38)Multiplying Eq. (2.37) by B tQ −1 , and by Eq. (2.38), we get
λ B Q B G P
Q
B
t −1+ 2 =
t −1 (2.39) so) 2
( )
(B tQ −1B −1 G + B tQ −1P
λ
= (2.40)Substituting Eq. (2.40) into Eq. (2.37), the coefficient of filters would be given by
P I Q
B B Q B B Q G
B Q B B Q
A t [ ( t ) t ]
2 ) 1
( 1 1 1 1 1 1
1 + −
= − − − − − − − (2.41)
The I is an
2
* 1 2
1 +
+ N
N identity matrix.
By applying this method, we can also design maximally flat filter, real and arbitrary complex coefficients FIR digital filters [11], as well as the lower and higher order differentiators from 1-D to 2-D [12]-[14].
2.3 Design of Finite-Precision Coefficient Linear-Phase FIR Digital Filters
In this section, the iterative Lagrange-Multiplier approach is proposed for designing the coefficient FIR digital filters [9]. The method associates the Lagrange-Multiplier approach with a tree search algorithm. For each branch of the tree, the Lagrange-Multiplier approach is used to optimize the un-quantized coefficients in the least-squares sense under fixing the quantized coefficients. Hence, we call this method the iterative Lagrange-Multiplier approach.
2.3.1 Design of Discrete Coefficient Case 1 Low-Pass Filters
In section 2.2.1, we have obtained the continuous coefficient filter by Eq. (2.26). Next, the operation in the discrete optimization algorithm is to get the un-quantized coefficients when some of the coefficients are fixed at discrete value. For example, we observe that the coefficients a(3), a(6)and a(5) are restricted respectively to the discrete values ad (3),
After that, we may make use of the Lagrange-Multiplier approach iteratively to find out all the discrete coefficients. The procedures for designing the finite-precision coefficient filters using the iterative Lagrange-Multiplier approach are listed as follows [9].
Step 1: According to the specifications D(ω), )W(ω , and N, we could obtain the optimized continuous coefficient by Eq. (2.26).
Step 2: Select the maximum absolute coefficient and fix it at L discrete values. Takea(r)for
an example; we fix it in the vicinity of a(r), which makes adi (r),i=1LL. Step 3: Establish the constrained matrix under fixing the first quantized coefficient by
L i
G A
B
it=
i, = 1 , L
(2.43)The i is the number of optimization problem, Bi(i),i=1LL are all zeros elements column vectors except the rth be unit and Gi(i),i=1LL are one element matrices with element adi (r),i=1LL, respectively.
Step 4: Find the L sets of continuous solution by Eq. (2.41).
Step 5: Select the maximum absolute coefficient from un-quantized coefficients for each of the L sets of continuous solutions and fix it at L different discrete values. For example, adij(r),i=1LL,j =1LL.
Step 6: Establish the constrained matrix under fixing the quantized coefficient for each of the L 2 optimization solutions by
L j
L i
G A
Bijt = ij , = 1,L , = 1,L , (2.44) where
L j
L i
Z B
B
ij= [
i j], = 1 , L , = 1 , L ,
(2.45) andi L j L
Y G G
j i
ij ⎥, = 1,L , = 1,L ,
⎦
⎢ ⎤
⎣
= ⎡ (2.46)
Here the i is the number of the previous quantization step, and j is the number of the latest step. Zj are all zeros elements column vectors except the rth be unit whose position is the same as that of the quantized coefficient in STEP5, and Yj are one element matrices with element adij(k).
Step 7: Find the L2 sets of continuous solutions by Eq. (2.41), and then select L sets of providing smallest error value by calculating Eq. (2.13).
L l
B
Bl = ij , = 1,L , (2.47) and
L l
G
Gl = ij, = 1,L , (2.48)
The L sets of Bij andGij are chosen in STEP 7, and then return to STEP 5.
Step 9: Select the set of discrete coefficients providing the smallest error value from the L sets of discrete solutions which obtained in STEP 7 as the desired solution.
We illustrate the above iterative Lagrange-Multiplier approach which the detail of the tree search algorithm as follow Fig. 2.4, where the white circles represents the continuous coefficients, and the black circles represents the discrete values. Each discrete value can be expressed as the sum or the difference of two powers-of-two.
Figure 2.4 The tree search structure with L=2.
Example 2.1:
Design a discrete coefficient N=37 low-pass filter with pass-band [0, 0.10625] and stop-band [0.14375, 0.5] in normalized frequency and L=2. It is desired that each coefficient would be expressed as a sum or difference of two powers-of-two, and the smallest allowable magnitude of the powers-of-two is 2-10. When the uniform weighting function W(ω)=1, the magnitude response of the discrete and continuous coefficient is shown in Fig. 2.5, and the comparisons of the continuous and discrete coefficients in this example are tabulated in Table 2.2. From Table 2.2, although the discrete coefficients using iterative Lagrange-Multiplier approach error is bigger than the continuous coefficients, it can decrease much area and advance the work frequency in hardware. In the next section, these circuits will be implemented.
Figure 2.5 Magnitude response in db of an N=37, L=2 low-pass filter.
Table 2.2 Continuous and discrete coefficients comparison.
coefficients Continuous Discrete powers-of-two
h(0) 0.00615695737455 0.00585937500000 2-8 - 2-9 h(1) 0.00574821648387 0.00390625000000 2-8 h(2) 0.00066011213269 -0.00195312500000 -2-9 h(3) -0.00712692802249 -0.00878906250000 -2-7 - 2-10 h(4) -0.01275313193206 -0.01171875000000 2-8 - 2-6 h(5) -0.01116240337706 -0.00878906250000 -2-7 - 2-10 h(6) -0.00082083028822 0.00292968750000 2-9 + 2-10 h(7) 0.01394999706636 0.01660156250000 2-6 + 2-10 h(8) 0.02410366832672 0.02343750000000 2-6 + 2-7 h(9) 0.02066430251568 0.01757812500000 2-6 + 2-9 h(10) 0.00095033192963 -0.00292968750000 2-10 - 2-8 h(11) -0.02747293043294 -0.03027343750000 -2-5 + 2-10 h(12) -0.04816111073242 -0.04687500000000 -2-4 + 2-6 h(13) -0.04282753163175 -0.03906250000000 -2-5 - 2-7 h(14) -0.00103436627664 0.00390625000000 2-8 h(15) 0.07251764232100 0.07812500000000 2-4 + 2-6 h(16) 0.15748077928576 0.15625000000000 2-3 + 2-5 h(17) 0.22523639080392 0.21875000000000 2-2 - 2-5 h(18) 0.25106348067610 0.24609375000000 2-2 - 2-8 Σ| h(n) | 1.60871874254361 1.60156250000000
誤差 2.71670580633e-004 4.8989252682e-004
Chapter 3
Implementation of 1-D Linear-Phase FIR Digital
Filters
3.1 Introduction
We described how to design the continuous and discrete coefficients FIR digital filters by using the proposed approaches in Chapter 2. In this section, we are going to introduce the necessary procedures in VLSI design in order to implement the architectures of the FIR system. Concerning this system detailed dataflow is shown in Fig. 3.1. In designing procedures, the timing is what we must pay attention to. Since how to catch the correct and steady values using the arrangement of the clocks is important, we have to examine the functions by using Cadence Verilog-XL simulator in the workstation.
In next section, we will introduce some circuit architectures, which contain the use of SPT multiplier and CSA circuit to enhance the working frequency as well as to reduce the area at hardware implementation. Then, the gate-level simulation result would be shown at the end.
Figure 3.1 Design flow of the hardware implementation
3.2 Circuit Architectures
For example 2.1, we can draw its circuit architectures with N=37 in transposed form and show it in Fig. 3.2. From Fig. 3.2, the critical path of this structure is a single multiplier, an adder, and a tap delay, which could be written as the equation below.
mul
add T
T
T = + (3.1)
It allows the system to operate at a low voltage supply in a critical path like that, and it
in pipelining exists, and thus it is also suitable for high-speed operation. In the transposed architecture, there are two defects that need to be handled. First, the heavy load in input is required for this structure, and as the number of taps is increased, the input signal bus would get longer and lead to larger load capacitances. Fortunately, this effect could be reduced by adding data buffers. Second, the delay elements in this structure are larger since they hold the accumulated sum instead of the input signal, and we could choose CSA architecture to solve it, which will be introduced in the next section.
Figure 3.2 Structure of linear-phase FIR in transposed form with N=37.
3.2.1 Adder Architectures
We firstly choose the classifications of adder in implementation of FIR filter. It shows the adder in Fig. 3.3, which is simply a chain of arrayed n full-adders, so that the carry out of a full-adder would be connected to the carry in of the next one. The carry bit ripples from the least significant full-adder up to the most significant one; hence, it is named the carry ripple adder.
Figure 3.3 Structure of the carry ripple adder.
According to the structure in Fig. 3.3, the speed of this carry ripple adder is:
sum carry
cpa n T T
T =( −1) + (3.2)
The Tcarry is the speed from c_in to c_out, and the Tsum is the speed from c_in to sum. The area is
fa
cpa n A
A = × (3.3)
The Afa is the area of one full adder cell.
Another one is just a set of one-bit fulladder and n-bit CSA consists of n full-adders (FAs), without any carry-chaining as shown in Fig. 3.4. Since the inputs are interchangeable, we could use a stage of full adder as 3-inputs, 2-output device, which adds three numbers, producing a sum and a carry output so this structure is called a carry save adder (CSA).
Figure 3.4 Structure of the n bits carry save adder.
Reminding that instead of propagating the carry signals to the MSBs, they are saved in the carry vector, so that the adder has only one FA delay. Moreover, the value of the arithmetic output in the CSA is now the sum of two vector carry and sum. In comparison, a CRA reduces two n-bit vectors to one sum vector, but it requires n FA delays, and that’s why the speed is slower than that in CSA.
In order to avoid the long critical path delay in the adder at the same time, the adder in FIR is converted to CSA in each tap, which is shown in Fig. 3.5. Furthermore, a so-called vector merge adder (VMA) is used after the last carry save addition to implement the carry propagation and obtain the final arithmetic value.
Figure 3.5 Structure of linear-phase FIR in transposed form with CSA
In Fig. 3.5, both a sum and a carry bit of the CSA are acquired in each bit position of the word, and this implies that two signals that the sum and carry propagate through the datapath, and carrying propagation problem inside adder is avoided. The trade-off is that two delay elements are required in each tap. One is the sum propagation datapath, and the other is the carry propagation datapath. Although the filter area might be enlarged, a higher throughput rate could therefore be achieved in the system.
3.2.2 SPT Multiplier Architectures
A FIR filter fractional coefficient could be expressed as a sum of signed powers-of-two (SPT) terms. In a sign digit number with radix 2, the vector bit can be
{
−1,0,1}
, and the general form would be written as the equation below.∑
==
L − kpk k
spt
n s
h
1
2 )
(
(3.4)The sk∈
{
−1,0,1}
, pk ∈{
0,1,L,M −1}
, M is the number of ternary digits, and L is the number of nonzero digits. From Eq. (3.4), the coefficient can be represented by some nonzero digits with signed value, so the multiplication in the FIR filter can be replaced by using shift-and-add operations.In order to reduce the hardware complexity and achieve a high system throughput rate, a multiplierless FIR filter design is chosen. By using SPT representation for constant coefficients, the SPT multiplier would be able to be implemented by combining bus shifts, and an appropriate amount of adders or subtractors, which is shown in Fig. 3.6. For example, a value 0.3125 can be reformulated as 0.0101 for 2 non-zeros digits. By doing so, the implementation of the multipliers could be accomplished by using two adders, and the cost can be significantly reduced in ASIC designs.
Figure 3.6 Structure of the SPT multiplier.
By using SPT representation for filter coefficients, the SPT multiplier can simply be implemented by combining bus shifts and minimal 2’s component adders since the structure of adders and subtractors in hardware are substantially equal. If the shift and add operations are applied instead of the multiplier, and then the structure would be seen as multiplierless filters. These could lower the power dissipation, and a shorter critical path delay is obtained in no losing throughput rate situation.
SPT multipliers can be implemented efficiently by using dedicated shift registers and adders. Sub-expression elimination is applied to a set of SPT multipliers with the same shift part that operate on a common adder. We list the distribution of all SPT coefficients in Table
SPT multipliers can be implemented efficiently by using dedicated shift registers and adders. Sub-expression elimination is applied to a set of SPT multipliers with the same shift part that operate on a common adder. We list the distribution of all SPT coefficients in Table