• 沒有找到結果。

In recent years, power consumption and power-related issues have become a first-order concern for most designs. For most designs, designers all try their best to minimize the power consumption. The goal of a battery-operated portable (wearable) platform is to extend battery service life while satisfying the performance requirement.

Since hearing aids are portable devices, we should do some optimizations to minimize the hearing aids’ power so that our design can become more competitive. The

following are some optimizations we used in our design and trying to minimize the power consumption for our filter bank.

z Polyphase implementation

An important advancement in multirate signal processing is the invention of the polyphase representation [17]. The polyphase representation permits great simplification of theoretical results and also leads to computationally efficient implementations of decimation/interpolation filters. For a general multirate decimation filter, the z-transform is as the equation (4-1).

The two polyphase components are the polyphase decomposition of the original D(z), so we can write D(z) as the sum of the odd and even terms that shown in equation (4-4).

)

Figure 4-5 shows the polyphase decimation filter design flow for out multirate filter bank. At the beginning, we decompose the origin multirate decimation filter into even part and odd part. Then we define two polyphase

components. One is the two times down-sampler and the even part of the decimation filter, the other is the two times down-sampler and the odd part of the decimation filter. In the end, we add two polyphase components.

2

X

2

2 Dodd(z)

Z-1

Deven(z)

X

D(z)

Y

Y

2

X

Dodd(z)

Z-1

Deven(z)

Y

Figure 4-5 Polyphase imeplementation

By using the FIR polyphase design, the complexity of the decimation filter is halved. This means we reduce about the twelve percent complexity of overall multirate system.

The polyphase implementation can also use in the design of the interpolation filter of our multirate filter bank. If we implement the synthesis bank or the whole auditory compensation system, the power consumption can reduce more by using this optimization method.

z Clock gating

Clock gating is a method that often used in low power designs. The clock gating provides a way to selectively stop the clock, and thus, force the original circuit to make no transition whenever the computation that carries out at the next clock cycle is redundant. In other words, the clock signal is disabled according to the idle conditions of the logic network. For reactive circuits, the

number of clock cycles in which the design is idle in some wait states is usually large. Figure 4-6 (a) shows the general flip-flop and (b) shows the gated clock flip-flop. The only difference of two figures is the input signal in clock pin. In gated clock flip-flop, an enable signal is asserted to determine whether the flip-flop needs to transition. In traditional design, the enable signal of the gated clock is asserted by hand and the designs with this optimization may be time consuming. Nowadays, the synthesis tool like design compiler (DC) can help us to build gated clock design so that we do not need to consume so much time on this optimization.

Flip-flop

D Q

Enable clock

input

Flip-flop

D Q

clock input

(a) (b)

Figure 4-6 General flip-flop (a) without clock gating (b) with clock gating

We use gated clock mechanism on the MAC module since the module performs the main function of the system. Other modules do not perform because the power dissipation of the control module is low and performing the optimization on these modules may be not efficient.

z Operand isolation

The idea of operand isolation is to identify redundant operations and using special isolation circuitry to prevent switching activity from propagating into a module whenever it is about to perform a redundant operation [21]. Therefore, the transition activity of the internal nodes of the modules can reduce

significantly, and thus has lower power consumption. In other words, we should identify the redundant computation components in our design datapath and isolate the components by using specific circuitry.

Figure 4-7 Operand isolation design (a) origin circuit (b) with operand isolation

Let we take the Figure 4-7 [21] for example, the Figure 4-7 (a) shows a normal design with two adder operands a0 and a1. For certain configurations of the multiplexor select signals S0, S1, and S2 and the register load enable signals G0 and G1, the output of a0 is not used to compute the values to be stored in registers r0 and r1. For instance, when the m1 multiplexor select signal is one, the adder a1 and the register G1 do not need the output of the a0. However, a0 will continue to compute a new output whenever there is switching activity at its inputs A and B, and therefore consuming power by executing redundant computations. For long periods in which the output is not used, this power overhead can be substantial.

Suppose that there is an activation signal ASa0 whose logic value indicates if a0 performs a computation that is not redundant. We can use ASa0 to control blocking logic, e.g. transparent latches that “freeze” the inputs of a0, effectively preventing the propagation of switching activity into the module. The module will therefore only perform non-redundant computations. The lower transition probability at the internal nodes of the module will then result in lower power

consumption. Figure 4-7 (b) shows the same circuit where the inputs of the two adders have been isolated using latches. Assume that ASa0 evaluates logic zero whenever a0 is performing a redundant computation. Inputs A0 and B0 maintain their previous values and do not transition when the operation to be performed by a0 is redundant.

Operation isolation technique needs to add extra circuit and it may cause some overhead for systems. In our design, the power consumption is dominated on the memory and the MAC engine. The inputs of MAC engine are coming from Memory so that we should try to isolate the memory output. For data memory, we use memory complier to compile a 256×16 register file. The reason we use register file instead of memory is because the register file consumes less power compared with SRAM. The register file has a chip enable pin called cen to enable the register file. This signal is good for operand isolation since we only need to add some combination circuit to control the cen signal. For the coefficient read-only memory (ROM), we add an extra multiplexor to select previous address before the address signal so that the address and the output of the ROM will keep the same when operation on ROM is redundant as shown in Figure 4-8.

addr Coeff.

ROM coef.

en

Figure 4-8 Operand isolation in coefficient memory

z Multi-vdd design

Using different voltages in different parts of a chip may reduce the global energy consumption of a design at a small cost in terms of algorithmic and/or architectural modifications. The traditional dynamic power dissipation equation is shown in equation (4-5).

V2

c f

P=α× × × , (4-5)

where α is the switching activity, f is the operating frequency, c is capacitance, and V is the supply voltage. The supply voltage affects dynamic power by a square term so that decreasing power supply on designs is an efficient way to reduce the power dissipation. The key observation of minimizing supply is that the minimum energy consumption in a circuit is achieved if all circuits paths are timing-critical (there is no positive slack in the circuit.) A common voltage scaling technique is thus to operate all the gates on non-critical timing paths of the circuit at a reduced supply voltage. Gates/modules that are part of the critical paths are powered at the maximum allowed voltage, thus, avoiding any delay increase; the power consumed by the modules that are not on the critical paths, on the other hand, is minimized because of the reduced supply voltage.

Using different power supply voltages on the same circuitry requires the use of level shifters at the boundaries of the various modules (a level converter is needed between the output of a gate powered by a low VDD and the input of a gate powered by a high VDD, i.e., for a step-up change.) Level converters are obviously not needed for a step-down change in voltage. Overhead of level converters can be mitigated by doing conversions at register boundaries and

embedding the level conversion inside the flip flops.

In our design, all timing is not critical because the loose specification of the sampling rate. Our clock period specification is 164ns and the critical path of the system is only 35ns (MAC unit) so we can use the property to reduce the power consumption. From the memory compiler, the lowest voltage that used in memory @TSMC 130 μm process is 1.08V for worst case and 1.2V for normal case. We can give two power domains on our design: the memory module use power domain 1 and other modules use power domain 2 as shown in Figure 4-9.

The memory power VDD1 is set at 1.2V and we gradually reduce the VDD2 by 0.1V stepping to measure the minimum working supply for our low power design.

Memory

MAC system control memory control level

shifters

power domain 1 power domain 2

VDD1 VDD2

Figure 4-9 Filter bank with two power domains

4.3 Results

This section describes the implementation results of our design. We use post-layout gate level simulation to estimate the power consumption on different optimizations. Moreover, we use nanosim simulation to get the power consumption results under different voltages.

„ Effectiveness of low-power optimizations

We use register file to save the inputs of six octaves and use ROM to save four filter coefficients. The register file can reduce pre-charge power compared with SRAM. To reduce exploration time, the optimization results are based on post-layout gate level simulation. All simulation patterns are random signals for 0.125 second.

The post-layout gate-level results of all optimizations are shown in Figure 4-10.

0 40 80 120 160

Figure 4-10 Effectiveness of low-power optimizations

In our optimizations, we do not focus on serial I/O interface because the serial I/O is just to avoid pad-limited design. We only take the filter bank results. From Figure 4-10, the filter bank consumes 144 μW. We analyze modules and find that the power in Mac and Memory dominates the overall system (about 90 %). All our optimizations are focus on these two modules except the multi-vdd design. We induce three optimizations that include polyphase implementation, clock gating, and operand isolation that mention in Chapter 4. The clock gating decreases 27% power consumption on Mac module since we only apply the optimization on Mac module and the polyphase implementation reduce about 7% power on Mac and Memory

module. The polyphase implementation reduces half of complexity on decimation filter and reduces the power. Operand isolation reduces the switch activities on register file and the coefficient ROM and thus reduces 57% power dissipation on Memory system. By adding three optimizations above, our filter bank system consumes 90 μW and about 37% power saving compared with our origin design.

Furthermore, we consider the multi-vdd optimization in our design and separate our design into two power-domains as shown in Figure 4-9. The general TSMC 0.13μm CMOS technology design only models the 1.2V, 1.32V, and 1.08V supply voltage for simulation. In our design, we want to measure the minimum working supply voltage for minimizing power consumption. We have a big change to reduce supply voltage except memory since the timing slack is about 130ns.We do simulation with Nanosim. For our multi-vdd architecture, we keep VDD1 = 1.2V and gradually reduce the VDD2 with 0.1V stepping and test the function correctness. The simulation result with different voltages is shown in Figure 4-11.

VDD2(V) Power (uW)

0 50 100 150 200 250

1.2 1.1 1 0.9 0.8 0.7 0.6

Estimated by nanosim Predicted from 1.2V nanosim result

Figure 4-11 Power consumption of filter bank with different voltage (except memory)

With simulation, the average power @1.2V of our design is 192 μW. As working

supply voltage is equal to 0.6V, function is correct and the power consumption is equal to 53μW. Besides, we predict the power consumption according to equation (4-5). The prediction power is 48μW @0.6V supply.

„ Hardware implementation

We had implemented the filter bank chip. The chip is fabricated in TSMC 0.13μm CMOS technology with the cell-based design flow. In the test chip, we use high threshold voltage cell (high Vt) in our design to reduce the leakage power. The final layout is shown in Figure 4-12.

control & serial I/O

MAC memory

(SRAM)

Figure 4-12 Chip micro-photo

Furthermore, we add serial I/O interface in our design. The serial I/O interface can reduce the pin counts and the silicon area. The synthesis tool is Design-Compiler and the clock rate is set at 6.09MHz. We use TSMC metro library with high threshold voltage cells. Table 4-4 shows the gate count and the percentage in our design. As we apply multi-vdd design, the memory can implement by register file and consume only 24 μW according to the PrimePower simulation. In this case, the total power is 77.33 μW.

Table 4-5 shows the design result compared with other hearing applications.

From our design, we have better performance (higher stopband attenuation and compliance with ANSI S1.11 specification) and lower power consumption than other hearing applications. Then we normalize the power consumption to compare the power consumption per input per band. The normalized power is derived by the equation 4-6.

=Power VDD bands

Pnormalized

We scale the power linearly with process and squarely with vlotage [22].

Furthermore, the normalized power calculates the power consumption per sample per band. The normalized power consumption of our chip has 46 ~ 80% saving compared with normalize power consumption of other filter bank applications.

Table 4-4 Gate count of the test chip

100.00

Table 4-5 Comparison of filter bank for hearing aids

5 C ONCLUSIONS

The thesis first probes the design issues for filter bank in hearing aids. The ANSI S1.11 1/3-octave filter bank is preferred for hearing aid applications because it well matches the human hearing characteristics. However, most existing filter banks for hearing aids implement proprietary filter banks that are not compliant with the ANSI S1.11 1/3-octave specifications because of the high computational complexity. This thesis presents a low-power FIR-based design of the ANSI S1.11 1/3-octave filter bank. A multirate ANSI S1.11 filter bank system and a systematic FIR filter coefficient design flow are proposed to find the optimal filter coefficients with minimized filter orders thus to reduce the computation complexity. The design only needs about 4% adder and multiplier complexity compared with the straight forward FIR design. Furthermore, the power consumption of proposed filter bank is minimized through algorithmic and architectural optimizations. For algorithm design,

we proposed multirate filter bank architecture. For architecture optimizations, we apply polyphase FIR design, clock gating, and operand isolation techniques. The polyphase implementation, clock gating, and the operand isolation skills reduce about 37% power consumption compared with non-optimized design that simulated by PrimePower. Besides, the multi-vdd design can further save power consumption. The 18-band filter bank has been implemented for digital hearing aids with 16-bit data wordlength and 24 KHz sample rate. From the simulation, the normalized power of our design saves 46% ~ 80% power consumption compared with other designs.

Our future work is trying to further reduce the power consumption by any other optimizations. For example, we can reduce the multiplier power by minimize the hamming distances of the multiplier. Figure 5-1 shows the power consumption with different hamming distances for four specify constant inputs.

0.00 10.00 20.00 30.00 40.00

1 2 3 4 5 6 7 8 H.D.

power (uW)

B=16'h0000 B=16'hffff B=16'h5555 B=16'haaaa Figure 5-1 Power consumption with different H.D. for constant inputs

From the figure above, we can derive that more hamming distances have more power consumption and we can negate the coefficients by using equation (5-1) for minimize the hamming distances to minimize power consumption.

) ( Coeff Input

Accu Coeff

Input Accu

Accu= + × = − × − (5-1)

The equation shows that we can encode coefficients on two different ways. We should define a power model on multiplier such that we can know whether the coefficient needs to negate.

The proposed coefficient negation method should be used in the filter bank in the future and it will consume less power compared with no optimization designs.

Besides, we will implement a configurable filter bank for hearing aid that can meet ANSI octave-band and fraction-octave-band. The band number and the octave number should adjustable to manage the power in different situations.

R EFERENCES

[1] M. A. Hersh, M. A. Johnson et al., Assistive Technology for the

Hearing-impaired, Deaf and Deaf-blind. London, U.K.: Springer-Verlag, 2003.

[2] [Online]. Available: http://www.nal.gov.au/

[3] R. Brennan and T. Schneider, “A flexible filter bank structure for extensive signal manipulations in digital hearing aids,” in Proc. ISCAS, 1998

[4] H. Li, G. A. Jullien, V. S. Dimitrov, M. Ahmadi, and W. Miller, “A 2-digit multidimensional logarithmic number system filter bank for a digital hearing aid architecture,” in Proc. IEEE Int. Symp. Circuit and System, vol.

2, 2002

[5] T. Lunner and J. Hellgren, “A digital filterbank hearing aid – design, implementation and evaluation,” in Proc. ICASSP, 1991

[6] L. S. Nielsen and J. Sparso, “Designing asynchronous circuits for low power: an IFIR filter bank for a digital hearing aid,” in Proc. IEEE, vol. 87, no. 2, 1999

[7] Y. Lian and Y. Wei, “A computationally efficient nonuniform FIR digital filter bank for hearing aids,” IEEE Tran. Circuits Syst., vol. 52, no. 12, 2005 [8] K. S. Chong, B. H. Gwee and J. S. Chang, “A 16-channel low-power

nonuniform spaced filter bank core for digital hearing aid,” IEEE Tran.

Circuits Syst., vol. 53, no. 9, 2006

[9] A. Lozano and A. Carlosena, “DSP-based implementation of an ANSI S1.11 acoustic analyzer,” IEEE Tran. Instrumentation and Measurement, vol. 52, no. 4, 2003

[10] S. B. Davis, “Octave and fractional octave band digital filtering based on the proposed ANSI standard, ” in Proc. ICASSP, 1986

[11] Specification for Octave-band and Fractional-octave-band Analog and Digital Filters, ANSI S1.11-2004, Feb. 2004, Standards Secretariat Acoustical Society of America

[12] M. Burkey, Overcoming Hearing Aid Fears: The Road Better Hearing, Rutgers University Press Publishers, 2008

[13] H. Dillon, Hearing Aids, Thieme Medical Publisher, 2001

[14] Kochkin. S, ”MarkeTrak VII: hearing loss population tops 31 million People,” The Hearing Review July 2005

[15] T. H. Venema, Compression For Clinicians, Thomson Delmar Learning, 2006

[16] P. Y. Lin, Feasibility Study of the Implementation of Hearing Aid Signal Processing Algorithms on the TI TMS320C6713 DSK, Master thesis, Institute of Biomedical Engineering, National Yang Ming University, 2004 [17] P. P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall, 1993 [18] W.W. Parks, J.H. McClellan, “Chebyshev approximation for norirecursive

digital filters with linear phase,” IEEE Trans. Circuit Theory, vol. 19, pp.

189-194, 1974

[19] Mohan Vishwanath, “The recursive pyramid algorithm for the discrete wavelet transform,” IEEE Trans. Signal processing. vol. 42, no. 3, March 1994

[20] C. Chakrabarti and M. Vishwanath, “Efficient realizations of the discrete and continuous wavelet transforms: From single chip implementations to mappings on SIMD array computers,” IEEE Trans. Signal Processing, vol.

43, pp. 759-771, 1995.

[21] M. Munch, B.Wurth, R. Mehra, J. Sproch, and N. Wehn1, “Automating RT-Leven operand isolation to minimize power consumption in datapaths,”

DATE-00: IEEE Design Automation and Test, pp. 624631, 2000 [22] T.C. Chen and R.B. Sheen, “A power-efficient wide-range phase-locker

loop,” IEEE Journal of Solid-State Circuits, vol. 37, no. 1, pp. 51-62, 2002

A PPENDIX - ANSI S1.11 COMPLIANCE

TESTING

We generate an ANSI compliance testing tool by using MATLAB software. The ANSI compliance testing tool can test items that define in the ANSI S1.11 standard.

The ANSI S1.11 standard has nine verification recommendations.

From the nine tests, final four test items are optional. The anti-aliasing filter is a low pass filter used before a signal sampler, to restrict the bandwidth of a signal to approximately satisfy the sampling theorem. Since our design only has digital part filter bank for hearing aids, we do not design the antialias filter. The FLAT frequency response in ANSI S1.11 standard is a special item that the designer can claim whether

From the nine tests, final four test items are optional. The anti-aliasing filter is a low pass filter used before a signal sampler, to restrict the bandwidth of a signal to approximately satisfy the sampling theorem. Since our design only has digital part filter bank for hearing aids, we do not design the antialias filter. The FLAT frequency response in ANSI S1.11 standard is a special item that the designer can claim whether

相關文件