Complexity-effective architecture design - 適用於數位助聽器之10毫秒群延遲且近似於ANSI S1.11 1/3-octave規範的濾波器組

FIR digital filters are well known to have some desirable properties like stability and linear phase response. The main drawback of it is the large mount of arithmetic operations needed in implementation, especially for the filters with narrow transition bandwidth. In order to cope with the computational complexity of sharp narrowband FIR filters, the interpolated finite-impulse response (IFIR) filter technique is introduced [24].

Magnitude (dB)

Frequency (Hz) H(z)

G(z)

G(z^L)

I(z) f_s1f_p1f_p2f_s2

L(f_p1- f_s1)

2π/L f_s2

(f_p1- f_s1)

…

G(z^L) I(z)

H(z)

fs1 fp1 fp2 fs2

Figure 4-1 IFIR implementation of H(z) (frequency domain)

Suppose the band-pass filter H(z) with the specification [δp, δs, fs1, fs2, fp1, fp2] as described in Figure 3-4. The basic IFIR structure can be composed of an image suppression filter I(z) and a model filter G(z) where L is the interpolation factor. IFIR filter is to implement the filter H(z) as a cascade of two FIR sections which are I(z) and G(z^L) as described in Figure 4-1. G(z^L) is produced from G(z). The impulse response of G(z^L) is formed by interpolating the impulse response of G(z) by a factor L and padded with zero. Figure 4-2 illustrates the

relationship between G(z), G(z^L), I(z) and H(z) in time domain for a interpolation factor of 2.

From the view of hardware implementation, G(z^L) is formed by using L storage elements to replace the original single storage element in G(z). Besides, I(z) is a image suppressor. In frequency domain analysis, G(z^L) has a periodic frequency response at high frequencies called image terms with period 2π/L. The task of I(z) is to suppress the unwanted image terms of basic pass-band filter at higher frequencies. In time domain analysis, the meaning of the cascaded I(z) and G(z^L) is that I(z) try to “fill in” the expected value of impulse response to G(z^L) in stead of “filling in” zero which will generates high frequency image, as described in Figure 4-2.

Amplitude

n G(z)

G(z²)

I(z)

H(z) G(z^L) I(z)

H(z)

Figure 4-2 IFIR implementation of H(z) (time domain)

Observe that if we increase the interpolation factor L the computational complexity (in

terms of multiplications per sample) of G(z) will decrease and the complexity of I(z) will increase. There exists an optimal interpolation factor L such that the complexity of H(z) is minimized. We can evaluate this value through some analytical derivation as follows. Suppose the band-pass filter H(z) with the specification [δp, δs, fs1, fs2, fp1, fp2] as described in Figure 3-4.

Rely on the order estimation formula of Kaiser [21], we can estimate the order of optimal equaripple FIR filter which is designed by using Park-McClellan algorithm.

   

multiplications per sample of H(z) can be expressed as (4-2)

 

pass-band ripple is estimated to be roughly half of the desired ripple specification. Finally, we can evaluate the complexity for all possible integer values of L to obtain the optimal interpolation factor for each filter.

With carefully selecting the interpolation factor L and choosing the best method to implement each band’s filter, there will be an optimum IFIR filter design with minimum hardware complexity. The price paid for these reductions is only a slight increase in the

number of delay elements as compared with direct implementation.

For example, we use the IFIR technique described above to design the lowest frequency band. Let the band-pass filter specification is as follows.



A conventional linear-phase Parks-McClellan linear phase FIR filter design requires orders up to 368. It requires 189 multiplications per input sample and 368 storage elements to buffer the input sample. After using the IFIR filter implementation method, it only requires 33 multiplications per input sample and the required storage elements slightly increase to 388.

The exploration of the computational complexity (in terms of multiplications per sample) with respect to different L is shown in Figure 4-3. With the increasing of the interpolation factor L the computational complexity of G(z) will largely decrease at the beginning and the complexity of I(z) will increase gradually. Finally, we can implement the filter with optimal IFIR filter structure. That is, when the interpolation factor is equal to 10, the filter will have minimum hardware complexity. The detail value for each interpolation factor is listed in Table 4-1.

2 4 6 8 10 12 14 16 0

20 40 60 80 100

Interpolation factor L

N u m b e r o f m u ltip lica tio n s p e r sa m p le

^H(z)

G(z) I(z)

Optimal factor = 10 H(z)

G(z) I(z)

Figure 4-3 Explore the optimal L for IFIR filter implementation

Table 4-1 Computational complexity (multiplications per sample) with different L

L 2 3 4 5 6 7 8

# mult. of G(z) 90 59 44 36 30 25 22

# mult. of I(z) 1 4 5 7 8 10 11

# mult. of H(z) 91 63 49 43 38 35 33

order of H(z) 362 362 362 374 376 370 374

L 9 10 11 12 13 14 15

# mult. of G(z) 20 18 16 15 14 13 12

# mult. of I(z) 13 14 16 20 24 27 30

# mult. of H(z) 33 32 32 35 38 40 42

order of H(z) 386 388 384 396 412 418 420

In addition to the IFIR implementation method, we also exploit the multirate processing technique. Multirate means multiple data rates and it offers many advantages, such as reduced computational complexity for a given task, reduced transmission rate, and reduced storage

requirement. Broadly speaking, if the filter is band-limit and its stop-band frequency is lower then /M, we can down-sample the filter by a factor of M to reduce the data rate. The M is called decimation rate. Ones the data rate is reduced, the computational complexity (multiplications per sample) are reduced. The filter can process the input sample once upon every M sample. By the theory of multirate systems [24], a synthesis bank with up-sampler and interpolation filter is necessary. The task of interpolation filter is to suppress the image terms in higher frequencies after the signal is up-sampled. So, the price needs to pay is the cost of the interpolation filter in synthesis bank. The interpolation filter will contribute extra computational complexity. This is a trade-off between analysis bank and synthesis bank.

When the decimation factor M increase, we have a lower data rate and can save more computational complexity in the analysis bank. But when the decimation factor M increase, we need an interpolation filter with narrower transition bandwidth and so have larger computational complexity in the synthesis bank.

Considering the architecture shown in Figure 4-4 (a), the cascaded IA(z) and G(z^L) is the IFIR implementation architecture described before. Then by the noble identity of the theory of multirate systems, we can derive the architecture in Figure 4-4 (b). Because the data rate is down-sampled by a factor M, the filter G(z) can process the sample once for every M sample.

More over, G(z) only need to buffer one sample from IA(z) for every M sample. As a result, not only the computational complexity but also the storage elements are reduced.

G(z^L)

Figure 4-4 Illustrations of multirate IFIR architecture and noble identity

Use the similar method in equation (4-2), we can derive the total number of multiplications per sample of the system in Figure 4-4 (b) as follows

 

_^^

When we increase the interpolation factor L, we have a lower computational complexity of G and a larger computational complexity of NIA. Like wise, when we increase the down-sample factor M, we have a lower computational complexity of G and a larger computational complexity of NIS.

For example, we use the IFIR and multirate technique to implement the filter with specification shown in equation (4-3). The exploration in order to entirely make use of the down-sample factor M, we set the down-sample factor equal to the interpolation factor L. The results are as shown in Figure 4-5.

2 4 6 8 10 12 14 16 0

20 40 60 80 100

Interpolation & down-sample factor

N u m b e r o f m u ltip lica tio n s p e r sa m p le

^H(z)

G(z) I(z)

H(z)

G(z) I(z) Optimal factor = 4

Figure 4-5 Illustrations of multirate IFIR architecture and noble identity

With the increasing of the interpolation and down-sample factor, the computational complexity of G(z) will largely decrease at the beginning and the complexity of I(z) will increase gradually. Note that the IA(z) and the IS(z) are the same due to the same factor of interpolation and decimation. Finally, we can implement the filter with optimal IFIR and multirate structure. That is, when the factor is equal to 4, the filter will have minimum hardware complexity. The complexity comparison with the directly implementation method is shown in Table 4-2. The number of multiplications per sample is saved by 88%. The number of storage elements which are used in delay line is saved by 73%.

Table 4-2 Complexity comparison with directly implementation direct FIR ﹢IFIR ﹢multirate

# multiplications 189 32 23

# storage elements 368 384 100

Now we use the IFIR and multirate implementation technique to realize 18 filters with

the specification found in Section 3.2. The optimal interpolation and down-sample factor of each band is reported in Figure 4-6.

0 2 4 6 8 10 12 14 16 18

Figure 4-6 Optimal factor of each band

Figure 4-7 Complexity-effective architecture of filter bank

The final complexity-effective architecture of the analysis bank and the synthesis bank is depicted in Figure 4-7. Note that the factor is the same from band1 to band12 and the image suppressor can be shared to further reduce the computational complexity. Besides, the band12

will pose the strictest constraints on designing image suppressor among the twelve filters. So, we only need to consider the constraint of band12 when we design IA2 and IS2.

The complexity comparison of the three 1/3-octave filter bank implementation is reported in Table 4-3. Compared to the direct implementation in [20], the proposed filter bank architecture achieves a 93% reduction in the multiplications per sample and a 74% reduction in the usage of the storage elements. Compared to the iterative-architecture implementation in [20], although the proposed filter bank have a 60% increasing in the multiplications per sample, the usage of the storage elements is saved by 85%. Because the delay of the iterative-architecture 1/3-octave filter bank is up to 78ms, it needs a lot of storage elements to synchronize the delay between bands.

Table 4-3 Complexity comparison of three 1/3-octave filter bank implementation

analysis synthesis total analysis synthesis total

Direct implement [20] 3144 0 3144 1308 641 1949

Iterative architecture [20] 120 20 140 246 3060 3306

Proposed 208 16 224 192 310 502

# multiplications per sample # storage elements

在文檔中適用於數位助聽器之10毫秒群延遲且近似於ANSI S1.11 1/3-octave規範的濾波器組 (頁 54-63)