586 IEEE Transactions on Consumer Electronics, Vol. 43, No. 3, AUGUST 1997
THE DESIGN OF A HYBRID FILTER BANK FOR THE PSYCHOACOUSTIC MODEL IN ISOIMPEG PHASES 1 , 2 AUDIO ENCODER'
Chi-Min Liu, Member, IEEE, and Wen-Chieh Lee
Department and Institute of Computer Science and Information Engineering National Chiao Tung University, Hsinchu, 30050, Taiwan
E-Mail: cmliu @csie.nctu.edu.tw
Abstruct- T h e ISO/MPEG phases 1 and 2 audio compression a r e receiving a wide range of applications. I n the encoding process of MPEG, the psychoacoustic model exploits audio irrelevancy which is the key role t o achieve high compression ratio without losing audio quality. However, the Fourier transform (FT) which has been used by the two psychoacoustic models suggested in standard d r a f t requires high computational complexity, and hence leads to high h a r d w a r e and software cost for real-time applications. This paper presents a new design named the hybrid filter bank t o replace the FT. T h e hybrid filter bank can be integrated with the psychoacoustic models and provides a much lower complexity than the FT. Also, this paper shows t h a t the hybrid filter is more suitable for the stereo coding and hence can provide a better qual- ity for the intensity stereo coding, which is the key technology for the MPEG 1 to achieve near trans- parent quality lower than 96x2 kbits for two stereo channels.
I. Introduction
IKE most perceptual audio coders [ 11-[3], MPEG audio encoder can be considered from four parts: the time-frequency mapper, the psychoa- coustic model, quantization and frame packing as shown in Fig. 1. The psychoacoustic model exploits audio irrelevancy which is usually defined in fre- quency domain. The time-frequency mapper maps the time-domain signals into a frequency representation to reduce the data redundancy and provides the ease with the integration with the psychoacoustic model. The quantization quantizes the audio signals from time-frequency mapper based on the information from the psychoacoustic model. The frame packing packs the quantized signals with some synchronous information like samDling freauencv for identified bv
... i Time to frequency transform : i
4 Bit i t ... ... Allocation i I Intensity P h j Psychoacousticmode
vi;
i!
... Fig. 1 The Structure of the FFT-based MPEG En-coder MPEG decoders.
In the encoding process of MPEG, the 1024-point Fourier transform (FT) has been used by psychoa- coustic models to analyze the frequency components in the 1152 samples of one frame. If the conventional real-data fast FT (FFT) [4] has been adopted for im- plementing the FT, the complexity has an order of (4*256*1og(512)). Such a complexity leads to high implementation cost for real-time applications.
This paper presents a new design named the hybrid filter bank to replace the FT. The hybrid filter bank can be integrated with the psychoacoustic models and provides a much lower complexity than the FT. Also, this paper shows that the hybrid filter is more suitable for the stereo coding and hence can provide a better quality for the intensity stereo coding, which is the I key technology for the MPEG I to achieve near transparent quality lower than 96x2 kbits for two ste- reo channels.
This rest of this paper is organized as follows: Sec- tion I1 illustrates the design of hybrid filter banks. The hybrid filter bank has problems in the phase shift and the aliasing components arising from the decima- tion in the l s t level filter bank. Section 111 provides the method to solve the two problems. Section IV considers complexity and the integration of the hy- brid filter banks with the psychoacoustic models in MPEG. Section V evaluates the design through spec- trum analysis, subjective measure, and objective
A Y 1
+This work was supported in part by Acer Laboratories Inc.
under contract C85098.
Liu and Lee: The Design of a Hybrid Filter Bank for the Psychoacoustic Model in IsoAlPEG Phases 1 , 2 Audio Encoder 587
measure to show the feasibility of the hybrid filter bank. Section VI gives a brief conclusion.
II. Filter Response in Hybrid Filter Banks The motivation of the hybrid filter banks can be considered from the two frequency analyzers in the time-frequency mapper and the psychoacoustic model. The MPEG has adopted a 32-band polyphase filter bank which can provide a frequency resolution xi32
with sidelobe attenuation 96 dB while the FT with Hann window a resolution
~ 1 5 1 2
with attenuation 32 dB. The approach of the hybrid filter bank is to cascade another filter bank, named the second (2nd ) level filter bank, to the output of the original polyphase filter bank, named the first ( l S t ) level filter bank, to achieve a high frequency resolution. The block diagram of the hybrid filter bank is shown in Fig. 2. i : "alize 4 f *i
Pol*filt€Tbnk ... ...I
BitI
I ...Fig. 2 Structure of MPEG encoder based on the hybrid filter banks
Fig. 3 shows the detailed structure of the hybrid fil- ter bank. The structure adopts a 16-band filter bank based on the time domain aliasing cancellation (TDAC) filter bank [6] for each band of the 1" level filer bank to achieve a frequency resolution as high as the FT. The input-output relation of the TDAC filter bank is
where xi(n) is the nth output of the band i from the lst level polyphase filter bank, Xj(k) is the corresponding output of the 2"d level filter bank and h(n) is the win- dow function deciding the band selectivity in the 2"d level filter bank. To achieve a frequency resolution
~ / 5 i 2 the same as the FT, the value of N is set to 32. Also, to have a frequency selectivity the same as the FT, we select the window function
(2)
R I
N 2
h(n)=sin(-(n+-)) forn=O, ..., N - 1
which has a sidelobe attenuation 24 dB as shown in Fig. 4. The function has the property
(3)
which is a necessary condition leading to the perfect reconstruction filter banks [5]. Substituting (2) into (1) yields N 2 fmk=Oto --1 I n I- CUAL ' tiltertlabk : 4.0 ...
{-.
... i; : :. ... ..,r.!J:v:::::
]E€
"
n ( 1 6 s u b b a n d s w F 32 L --, 1 'ilte ! SUI-1,,~
... ...Fig. 3 Detailed structure of the hybrid filter bank
I I I I -37 02 -70 12 p -103 22 -13632 a 0 -16941 P
5
-20251 -23561 -268 71 -301 81 -334 910 100 200 300 400 500 Normalized frequencyFig. 4 Power spectrum of the 2"d level filter bank 111. Phase Shifter & Alias Reduction
0
51 1
As mentioned in [7], [8], the hybrid filter bank has problems in the phase shift and the aliasing compo- nents arising from the lSt level filter bank. We follow the similar concept in [7], [8] to design a phase shifter and an alias reduction butterfly to solve these two problems.
A. Design of the phase shiffer
Due to the decimation operation implied in the 1" level filter bank, the 1'' filter bank has a phase shift
588 IEEE Transactions on Consumer Electronics, Vol. 43, No. 3, AUGUST 1997
n in the odd-indexed subbands. The phase shift causes a reversed spectrum for the subband. If fur- ther spectral analysis is needed to achieve higher fre- quency resolution, this shift should be corrected. This phase shift can be corrected by multiplying (-1)"to the subband signal in the odd-indexed sub- bands; that is I: s i n ( E ( n + - ) ) x i ( n ) c o s ( L ( 2 n + l + - ) ( 2 k + 1 ) ) f o r 1 N oddi n=il N 2 2N 2 N 2 f o r k = O t o - - 1
where oddleven stands for oddleven indexed subband of 1'' level filter bank. The phase shifter can be com- bined into window function to avoid computation burden.
B. Design of the aliasing reducfion bufferfly
It has been well known that the decimation opera- tion leads to aliasing and there are decimation in the hybrid filter banks. The aliasing effects indicate a many-to-one merging between the input frequency and output frequency of filter banks, and hence lead to the difficulty distinguishing the "many" frequency components from the "one" frequency component. The merged frequencies and the corresponding merg- ing weights are decided by the filter bandwidth and the magnitude response of the filter in filter banks. For the filter bank designed in last section, since that the sidelobe attenuation is around 24 dB, the aliasing
Normalized frequency Fig. 5 Alias in neighboring subbands
term of the frequency in a filter band can be reasona- bly approximated by the frequency components from the nearest neighboring band. For the hybrid filter bank design in Fig. 3, aliasing arises from both the lst filter banks and the 2"d filter banks. The aliasing terms in the lSt level filter bank lead to the merging of frequencies with distance as far as ?i./32 while that in the 2"d level filter bank ~ / 5 1 2 . Since that the psy- choacoustic models in MPEG needs a frequency reso-
lution ~ / 5 1 2 , the aliasing terms from the l s t level fil- ter bank should be suitably corrected- to increase the frequency resolution.
Fig. 5 shows the frequency responses for the two neighboring filters in the 1'' level filter bank before decimation. The lattice lines in Fig. 5 show the reso- lution boundary for the 2"d level filter bands. The cross lines in Fig. 5 shows the merged bands from the decimation in the lst level filter bank.
Edler [7] has designed the butterfly structure in Fig. 6 to ease the aliasing errors in hybrid filter banks. The hybrid structure in Fig. 3 has included the butter- fly structure to compensate the aliasing terms. The butterfly operation is
( 6 )
U, = d,,, ( r, - cmr, ) with i=16. k-l-m
uJ = d,,, ( rJ
+
c,r, ) with j=16. ktm withd, =l/Jl+c;, - N / 2 1 m < - 1where ri and rj are the band signals from the 2"d level filter bank indicated by the cross lines in Fig. 5. The
ui and uj are resulted signals after the correction. The Cm and dm are the two weighting factors de-
signed to compensate the aliasing errors in each band. The values of these two factors vary with bands la- beled as m indicated in Fig. 5 . In the following we show the method to obtain the values for these weighting factors.
C. Design of the weighting facfors in the butterfly
For the bands other than those labeled as m=-1 and 1, the weighting factors are calculated using the ratio between the filter response energy in the signal band and that of the aliasing band:
Energy of alias band alia s i n g baad
Energy of signal band m j I H ( w )I2 w
w n e i b a n d
where H(w) is the frequency response of one filter in the l s t filter bank.
, . . . , . . . ,
Fig. 6 Structure of alias re- E Table I bi; duction butterfly f ~ r + r \ r c nf
0.00429
.I
0.99824. ... .
'
*'
0.00049I
1.00000I
_.
ght weightingluvcv.Ll alias reduc-
Liu and Lee: The Design of a Hybrid Filter Bank for the Psychoacoustic Model in ISOMPEG Phases 1 , 2 Audio Encoder 589
-
0
1
However, the compensation should be modified for 1st level 2nd level Critical bands
-
1
I
the bands labeled as m=- 1 and 1. As described above, there are aliasing from the Znd level filter bank. For example, the band labeled as m=2 have aliasing terms from the band labeled as m = l and m=3. However, the aliasing terms for m=-1 and m=l are only from the band m=-2 and m=2, respectively. To take the special effect into the butterfly, the weighting factors for m=- 1, I are calculated as
Energy of alias band (1-r) (8)
c, Of-C-1 =
d
Energy of signal band mwhere y is the ratio between the filter response energy of the signal and the aliasing terms in the 2"d level filter bank. Table 1 summarizes the values of the
Algorithms of frequency mapping in psychoacous- # of multiplications per 1152 samples tic model
1024 pt. FFT (real FFT) + Hann window 32 (32 pt. TDAC filter bank + window)
32 (32 pt. TDAC filter bank
+
window + Alias 3584+32*6*2*2=4352 cancellation)12 (TDAC + window + Alias cancellation) + criti- 12/32*(4352)=1632 cal bands
4*256*log(512)+5 12=9728 32*16*log(32)+32*32=3584
subband subband .
# of additions per 1152 samples 2*256*log(5 12)+2*5 12*log(5 12)=9216 32*32*1og(32)=5 120 5120+32*6*2=5504 12/32*(5504)=2064
I
H
U
0 HzFig. 7 Hybrid filter bank resolution vs. critical band
weighting factors. Scaling factor
Left Intensity
t
Left Audio in
590 JEEE Transactions on Consumer Electronics, Vol. 43, No. 3, AUGUST 1997
Table 2 shows the complexity of the hybrid struc- ture compared with the FFT. The 1024-point real-data FFT requires 256*1og(5 12) complex multiplications and 5 12*log(5 12) complex additions with Hann win- dow of 512 multiplications, while 32 2"d level TDAC filter banks with the 6 aliasing cancellation butterfly structures require only an order of 32(16*10g32+32 +6*2*2) when the fast algorithm of the TDAC filter bank [ 101 is applied. Further reduction from the per- ceptual resolution can reduce the complexity as indi- cated in row 4 of Table 2.
B. Cooperating with the intensity mode
The other advantage of the substitution of the hy- brid structure for the FT in the psychoacoustic models of MPEG is on the stereo encoding. As mentioned in our previous paper [9], the intensity stereo coding is the key technology for layer 2 in MPEG 1 to achieve a near transparent quality at a bit rate as low as 96x2 kbits for the two stereo channels. However, the origi- nal FT analysis has problems in maintaining a consis- tent frequency analysis with the stereo signals. When the high frequency parts of the two stereo channels are combined into one channel in intensity stereo cod- ing or the scheme mentioned in [9] as shown in Fig. 8, original FT analysis result is not representative for the frequency analysis of the combined channels.
One way to overcome this inconsistent problem is to recalculate the FT analysis and the psychoacoustic model for the two channels somehow based on the combined channels. This recalculation leads to heavy computing load. On the other hand, when these ste- reo coding schemes are applied, the hybrid structure can be easily tuned to a consistent analysis. Modifi- cation of the frequency analysis and the correspond- ing psychoacoustic model can be performed only on part of the frequency range for the combined channels through the hybrid structure. The hybrid filter bank cooperating with the intensity stereo coding scheme is shown in Fig. 9.
C. Tonality measure
The determination of the tonality of a spectrum line or a band is important in the psychoacoustic model to calculate the sensitivity of the human on the lines or bands. The psychoacoustic model 2 indicated in MPEG draft consider the tonality through a simple prediction calculated in polar coordinates in the com- plex plane[2]. The tonality detection above is origi- nally designed based on the complex numbers in the output of the Fourier transform. Since that the output of the hybrid filter bank presented in this paper is real
data, the detection mechanism should be suitably modified. The predicted magnitude for a spectrum lines is denoted as ~ ( t , f ) , which is calculated from the two preceding magnitudes r ( f - l , f ) , r ( t - 2 , f ) :
?( t ,
f )
= I-(t -1,
f)+
(I-(t -
1,f)-r
( t -2 , f ) )
(9) where t and f represent the index of time and fre- quency, respectively. The tonality factor c(t ,f) used in psychoacoustic model 2 can now be obtained asJ r
( 1 , f ) 2 -U"
( f ,1- ( t ,
f
)+ abs (1'(
f ,f
))c ( t , f ) =
For tone signals, the prediction turns out to be very good, and c(t,
8
will have a value near zero. On the other hand, for very unpredictable signal such as noise signals, c(t, f ) will have a value near 1.V. Quality Measure
The effects of the hybrid filter bank and the corre- sponding modification can be illustrated by compar- ing the spectrum from the FT and that from the hy- brid filterbank. The spectrum analysis for signals with five components at frequencies 400Hz, SOOHz, 1600Hz, 3200Hz and 6400Hz are shown in Fig. 10 through the FT (dotted line), the hybrid filter bank without alias reduction (dashed line with lOOdB shift- ing up) and the hybrid filter bank with alias reduction (solid line with 200dB shifting up). The location of each frequency of the hybrid filter bank are almost the same as the one of FT and the alias component of the hybrid filter bank with alias reduction can effec- tively reduce the aliasing terms.
Several audio segments has been adopted to meas-
Frequency (Bin)
Fig. 10 Signal with frequency located at 400Hz, 800Hz, 1600Hz, 3200Hz and 6400Hz analyzed by 1024 pt. FT (dotted line), the hybrid filter bank (dashed line) and the hybrid filter bank with alias reduction butterfly (solid line)
Liu and Lee: The Design of a Hybrid Filter Bank for the Psychoacoustic Model in ISOMPEG Phases 1 . 2 Audio Encoder 591
ure the signal-to-masking ratio [9] from the FT and the various hybrid filter bank. Two of the results are shown in Fig. 11 and Fig. 12 where the FT is denoted by the solid line, the hybrid filter bank with alias re- duction by dotted line, and the hybrid filter bank with only 12 bands in the 2nd level by dashed line. The results show that the hybrid filter bank with low
e -140k
-I
- 1 7 0 t 1
,
I,
I I,
II 1
3 6 9 12 I S 18 21 24 27 30
-2000
Subband
Fig. 11 Average signal-to-masking ratio of each subband for female vocal sound
h
3
3
2‘
d 8 i? 1 7 0 t 1 I I I I I I II 1
3 6 9 12 15 18 21 24 27 30 2000 SubbandFig. 12 Average signal-to-masking ratio of each subband for classical symphony orchestra complexity can provide a result similar to the FT.
Also, informal listening tests show that the audio segments coded by the psychoacoustic model of the FT and the hybrid filter bank are almost impercepti- ble.
VI. Concluding Remarks
This paper has presented a new design named hy- brid filter banks to replace the FT adopted in the psy- choacoustic model suggested in the draft on the MPEG phases I and I1 audio coding. This paper has given the means to solve the phase shift and aliasing problems in the hybrid structure. The hybrid filter
bank can be well integrated with the psychoacoustic model and provide a much lower complexity than the FT. We have also shown that the hybrid filter bank can cooperate with intensity stereo coding scheme to obtain higher audio quality. Due to the flexibility of the hybrid filter bank, a consistent psychoacoustic model with the intensity stereo coding channel can be obtained with little computation increasing. The hy- brid filter bank is tested through spectrum analysis, subjective measure, and objective measure to show the feasibility.
References
[ l ] J. D. Johnston, “Transform coding of audio signals using
perceptual noise criteria,” IEEE Journal on Selected Area in
Communications, vol. 6, no. 2, pp. 314-323, Feb, 1988.
[2] K. Brandenburg, J. D. Johnston, “Second level perceptual
audio coding: the hybrid coder,” The 88th Convention of AES,
March 13-16, 1990.
[3] R N. J. Veldhuis, “Bit rates in audio source coding,” IEEE
Journal on Selected Areas in Communications, vol. 10, no. 1,
pp. 86-96, Jan, 1992.
[4] E. 0. Brigham, “The fast Fourier transform and its applica-
tion,” Prentice Hall Inc., 1988.
[5] P. P. Vaidyanthan, “Multirate digital filters,” Prentice Hull
Inc., 1993.
[6] J . Princen, A. Johnson, A. Bradley, “Subband/ transform cod- ing using filter banks designs based on time domain aliasing cancellation,” Proc. of the ICASSP 1987, pp. 2161-2164. [7] B. Edler, “Aliasing reduction in sub-bands o f cascaded filter
banks with decimation,” Electronic Letters vol. 28, no. 12, pp. 1104-1106, Jun., 1992.
[8] K. Brandenburg, E. Eberlein, J. Herre, B. Edler, “Comparison of filterbanks for high quality audio coding,” IEEE Interna-
tional Symposium on Circuit and Systems vol. 3, pp. 1336-
1339, 1992.
[9] C. M. Liu and J. C. Liu, “A new intensity stereo coding
scheme for MPEG audio encoder- layer I and 11,” IEEE Trans.
on Consumer Electronics, vol. 42, pp. 535-539, Aug., 1996.
[ lo ] T. Sporer, K. Brandenburg, B. Edler, “The use o f multirate
filter banks for coding of high quality digital audio,” The 6th
European Signal Processing Conference, vol. 1, pp. 21 1-214,
Jun., 1992.
Chi-Min Liu received the B.S. degree in electrical engineering from Tatung Institute of Tech- nology, Taiwan, R.O.C. in 1985, and the M.S. degree and Ph. D. degree in electronics from Na- tional Chiao Tung University, Hsinchu, Taiwan, in 1987 and
1991, respectively.
He is currently an Associate Professor of the De- partment of Computer Science and Information Engi- neering, National Chiao Tung University, Hsinchu, Taiwan. His research interests include video/audio
592 IEEE Transactions on Consumer Electronics, Vol. 43, No. 3, AUGUST 1997
compression, speech recognition, radar processing, and application-specific VLSI architecture design.
Wen-Chieh Lee received the B.S.
degree from the Department of Computer Science and Information Engineering, National Chiao Tung University, Hsinchu, Taiwan in 1995. He is currently a Ph.D. stu- dent of the Department of Com- puter Science and Information En- gineering, National Chiao Tung University, Hsinchu, Taiwan. His research interest is in the area of audio compression