Chapter 2 Basic Component
2.2 QMF Banks in HE-AAC Encoder
2.2.2 Real QMF Banks
The real analysis and synthesis filter banks include both the positive and negative parts shown in Figure 13.
The framework of real QMF banks also includes four stages, analysis filter bankHk(z), decimators, expanders and synthesis filter bank Fk(z) like complex QMF banks. But the real analysis QMF bank Hk(z) and real synthesis QMF bank
) (z
Fk are derived from prototype filter by cosine modulation as (7)-(11).
M
To take an example of 4-channel real QMF banks, the original signal in Figure 8 is separated into four subbands by a real 4-channel analysis QMF bank shown in Figure 13. Figure 14 shows the four individual subbands.
After the decimation and the expansion, the original component of negative frequency is overlapped by the two image components produced by the original component of positive frequency shown in Figure 16, and the original component of positive frequency is overlapped by the two image components produced by the original component of negative frequency.
Therefore, by the synthesis filterF1(z)shown in Figure 17, the synthesized subband ˆ ( )
1 z
X includes not only the original components but also the overlapping aliasing terms introduced from the four adjacent image bands.
By summing up the four synthesized subbands, the aliasing term will be cancelled mutually and the original signal without the aliasing terms can be reconstructed.
The aliasing term N1L of Figure 18 can be cancelled out by the aliasing term N2R of Figure 19, and the aliasing term P1R of Figure 18 can be cancelled out by the aliasing term P2L of Figure 19, as long as ak*bk =−a*k−1bk−1 for (7) and (8).
Figure 13: Real analysis QMF bank
Figure 14: The original signal and the four real analysis filters
Figure 15: The four subbands analyzed by real analysis QMF bank
Figure 16: The subbandY1(z)after the decimation and the expansion
Figure 17: The subband ˆ ( )
1 z
X synthesized by the synthesis filterF1(z)
Figure 18: The subband ˆ ( )
1 z
X synthesized by the synthesis filterF1(z)
Figure 19: The subband ˆ ( )
2 z
X synthesized by the synthesis filterF1(z)
Therefore, we can get the reconstructed signal in real QMF banks without aliasing terms like complex QMF banks.
However, the SBR encoder calculates the SBR parameters in the QMF domain and the subbands in real QMF domain have aliasing terms. Therefore, the SBR parameters under real QMF banks are not the same as those under complex QMF banks. In Chapter 3, the critical problem will be resolved by proposed mechanisms in low power HE-AAC encoder.
Chapter 3
Low Power HE-AAC Encoder
Through the sections from 3.1 to 3.3, three architectures are proposed to implement the low power HE-AAC encoder. In addition to specifying the low power HE-AAC encoding approaches, the performance is evaluated based upon the complexity and the quality.
In section 3.1, the basic low power HE-AAC encoder is introduced without additional mechanism to avoid artifacts. In section 3.2, Tonality Adjustment based low power HE-AAC is proposed to improve the encoding quality and keep the complexity as low as conventional HE-AAC by compensating the tonality measurement error. In section 3.3, Complexification based low power HE-AAC encoder is proposed to calculated the accurate tonality rather than just the compensated value in 3.2.
3.1 Basic Low Power HE-AAC Encoder
The complex and real-value QMF banks have been illustrated in Chapter 2.
Now the real-value QMF banks are included in HE-AAC encoder.
In 3.1.1, the implement of basic low power HE-AAC encoder is specified based upon real-value QMF banks. In 3.1.2, the complexity is evaluated to prove the low power HE-AAC encoder is faster than conventional one. In 3.1.3, the artifacts spectrum of basic low power HE-AAC encoder is proposed due to the subbands’
tonality error caused by aliasing terms.
3.1.1 Implementation
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
Figure 20: Prototype Filter P0(ejω) for QMF Banks in HE-AAC Encoder
)
0( jω
e
P is a linear-phase FIR filter of length N=640 shown in Figure 20. In conventional HE-AAC encoder, the exponential modulated analysis and synthesis QMF banks are implemented by the prototype filter P0(ejω) as (12) and (13).
In low power HE-AAC encoder, the cosine modulated analysis and synthesis QMF banks are implemented by the prototype filter P0(ejω) as (14) and (15).
⎭⎬
⎫
⎩⎨
⎧ + − +
= ( 0.5)(2 )
cos 2 ) ( )
( 0 k n N M
n M p n
fk π
(15)
Figure 21: 64-band Complex Analysis Filter Bank in HE-AAC Encoder
Figure 22: 64-band Real Analysis Filter Bank in HE-AAC Encoder
Then the real-value QMF banks are integrated in HE-AAC encoder to replace the complex-value QMF banks as Figure 23.
Figure 23: Block diagram of Tonality Adjustment based Low Power HE-AAC encoder.
Figure 24: Conventional HE-AAC Encoding
Figure 25: Basic low power HE-AAC encoding without Energy Adjustment Figure 24 and Figure 25 illustrate that there is energy difference about 3dB in HF bands encoded by conventional and basic low power HE-AAC encoder respectively. Therefore, the subbands transmitting to the SBR encoder have to be calibrated to the correct energy by 2, that is
2 ) ) (
'( V z
z
Vk = k (16)
As illustrated in Figure 26 and Figure 27, the calibrated envelope based on real-value QMF banks is very close to the one based on complex-value QMF banks.
Figure 26: HE-AAC encoding
Figure 27: Low Power HE-AAC encoding with Energy Adjustment
3.1.2 Complexity
The conventional type of HE-AAC encoder means HE-AAC encoder with complex-value QMF banks. The total complexity is assumed as 100 percent. The complexity of QMF banks, SBR encoder and AAC encoder are 4, 70 and 26 percent according to the computational amount in Table 1.
Due to the half encoding complexity of QMF banks and SBR encoder, the total complexity of low power one can be derived to 63 percent compared with conventional one shown in Table 1.
Complexity Type
QMF Banks
SBR Encoder
AAC Encoder
Total Complexity
Conventional 0.04 0.70 0.26 100%
Basic
Low Power 0.02 0.35 0.26 63%
Table 1: Complexity of conventional and basic low power HE-AAC encoder
3.1.3 Artifacts
The two typical artifacts, noise overflow and tonal spike are discussed in [6]
due to improper tonality measurement. The human hearing is very sensitive to such artifacts. While tonal energy is overestimated and the noise floor energy is underestimated, it will cause tonal spike.
In Figure 29, the reconstructed HF bands with real QMF banks has the noise overflow artifact in 13600 Hz and 16300 Hz because the HF subbands mixed by aliasing terms results in improper tonality measurement in SBR encoding process.
In 3.2, the tonality error of subbands is analyzed by qualitative and quantitative analysis in order to compensate the tonality error by advanced low power HE-AAC.
Figure 28: Spectrum of original signal
Figure 29: Spectrum of conventional and basic low power HE-AAC encoder
3.2 Tonality Adjustment based Low Power HE-AAC encoder
In 3.2.1, the tonality measurement error can be demonstrated from qualitative and quantitative analysis. In 3.2.2, Tonality Adjustment based low power HE-AAC is proposed to compensate the tonality measurement error in order to improve the encoding quality by the result in 3.2.2. Finally, the performance is evaluated based upon the complexity and the quality.
3.2.1 Tonality Error of Subbands in Low Power HE-AAC encoder
3.2.1.1 Qualitative Analysis of Tonality Error
The section explains why the tonality of subband will be changed after substituting complex QMF banks for real QMF banks. In the beginning, some symbols are defined for the following demonstration.
] , [ ik
TC means the tonality of subband k and frame i calculated by the tonality estimator in conventional HE-AAC encoder. The tonality error under complex-value QMF banks can be defined as (17).
63
TR means the tonality of subband k and frame i calculated by the tonality estimator in low power HE-AAC encoder. The tonality error under real-value QMF banks can be defined as (18).
63 means the noise energy of the k-th subband in i-th frame.
The tonality error can be defined as (19) in order to know the degree of tonality
The following examples illustrate that the tonality error is getting serious with the decrease of TR will be explained in this section. It illustrates the reason why TR is lower than T in noise-like signal and C TR is similar to T in tonal signal. C
The white noise signal shown in Figure 30 is decomposed by k-th analysis filter designed in section 3.1 to obtain the subband k in Figure 31 (a). Then the subband k is down-sampled in Figure 31 (b).
Because the tonality estimator calculates the tonality of down-sampled signal, the difference of the down-sampled subband spectrum between complex-value QMF banks in Figure 31 (b.1) and real-value QMF banks in Figure 31 (b.2), can be analyzed to explain why TR is lower than T in noise-like signal. C
In conventional HE-AAC encoder using complex-value QMF banks, there is no negative frequency signal produced by complex-value analysis filter in the spectrum.
After down-sampling by M (M=64), the down-sampled subband in Figure 31 (b.1) is located only in positive or negative frequency of spectrum.
In low power HE-AAC encoder using real-value QMF banks, originally the positive and negative frequency components of subband in Figure 31 (a.1) don’t overlap with each other. However, the two components overlap at 0, -π and π after down-sampling by M (M=64) because the bandwidth of the prototype filter is more than
M π .
TR is lower than T in noise-like signal C
In Figure 31 (b.1) and (b.2), obviously the subband under real-value QMF banks is recognized as a pure white noise signal by the tonality estimator. On the
Therefore, the tonality of subband under complex-value QMF banks is higher than which is processed under real-value QMF banks.
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
Figure 30: White noise signal
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
(a.1) (a.2)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
(b.1) (b.2) Figure 31: White noise signal processed by complex-value (left) / real-value
(right) QMF banks (a) decomposed subband k and k-th analysis filter. (b) down-sampled subband k
TR is similar to T in tonal signal C
According to Figure 33 (b.1) and (b.2), obviously the signals in -0.5π and +0.5π under real-value QMF banks are recognized as tones. Under complex-value QMF banks, just the signal in -0.5π is recognized as a tone. But the noise energy under real-value QMF banks is also two times of which under complex-value QMF banks. Therefore, the tonality of subband under complex-value QMF banks is similar to which is processed under real-value QMF banks.
-80
Magnitude Response in dB
Figure 32: Tonal signal
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
(a.1) (a.2)
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Normalized Frequency (×π rad/sample)
Magnitude (dB)
Magnitude Response in dB
(b.1) (b.2) Figure 33: Single-tone signal processed by complex-value (left) / real-value
(right) QMF banks (a) decomposed subband k and k-th analysis filter. (b) down-sampled subband k.
Therefore, the examples can illustrate that TR is lower than T in noise-like C signal and TR is similar to T in tonal signal. Furthermore, The phenomenon can C shown by quantitative analysis.
3.2.1.2 Quantitative Analysis of Tonality Error
By quantitative analysis, the tonality error between complex-value and real-value QMF banks encoder can be derived by MLD algorithm [6] in tonality estimator module of SBR encoder. And there is a trend that the tonality error is getting increasingly serious with the decrease of the tonality under real-value QMF banks.
The improper tonality measurement will result in serious artifacts in the reconstructed HF subbands. In SBR encoder, the tonality estimator applies linear prediction based MLD algorithm to calculate the tonality. In order to precisely measure the actual tonality, the poles of the linear prediction filter must match the number of tones contained in the subband, and thus the prediction order should equal to the number of tones.
In the case of missed detection, the underestimated tonal energy and the overestimated noise floor energy will result in the noise overflow in HF spectrum.
Oppositely, the overestimated tonal energy and the underestimated noise floor energy will result in the tonal spike in HF spectrum.
) Then, the average tonality error TE( ji, ) under twelve critical signal defined by MPEG can be derived as Table 2. For an example, (0,0.09) in es01 means the tonality error TE(0,0.09)=avg(TR−C[k,t])=−0.37 in the case between
In general case, the average tonality error TE( ji, ) is getting increasingly serious with the decrease of TR shown in Figure 34 as qualitative analysis in 3.2.1.1.
) ,
( ji es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03 (0.90,0.99) 0.05 0.04 0.11 0.05 0.00 0.07 0.05 0.01 0.06 0.06 0.04 0.04 (0.80,0.89) 0.01 0.04 0.02 0.00 0.00 0.02 -0.02 0.01 -0.02 -0.01 -0.01 0.01 (0.70,0.79) -0.05 -0.01 -0.02 -0.07 -0.04 -0.04 -0.09 0.03 -0.10 -0.09 -0.09 -0.06 (0.60,0.69) -0.07 -0.02 -0.07 -0.10 -0.09 -0.08 -0.12 -0.01 -0.13 -0.13 -0.12 -0.10 (0.50,0.59) -0.10 -0.07 -0.06 -0.13 -0.13 -0.12 -0.15 -0.05 -0.17 -0.16 -0.15 -0.14 (0.40,0.49) -0.13 -0.09 -0.11 -0.16 -0.17 -0.16 -0.19 -0.09 -0.22 -0.19 -0.17 -0.17 (0.30,0.39) -0.17 -0.14 -0.15 -0.20 -0.21 -0.20 -0.23 -0.16 -0.28 -0.24 -0.18 -0.22 (0.20,0.29) -0.23 -0.20 -0.20 -0.23 -0.25 -0.25 -0.27 -0.21 -0.31 -0.29 -0.21 -0.26 (0.10,0.19) -0.29 -0.26 -0.26 -0.28 -0.30 -0.30 -0.32 -0.27 -0.34 -0.36 -0.27 -0.31
) ,
( ji Average Standard Deviation
(0.90,0.99) 0.0489 0.0421
(0.80,0.89) 0.0004 0.0531
(0.70,0.79) -0.0566 0.0732
(0.60,0.69) -0.0942 0.0950
(0.50,0.59) -0.1254 0.1210
(0.40,0.49) -0.1622 0.1411
(0.30,0.39) -0.2062 0.1549
(0.20,0.29) -0.2475 0.1690
(0.10,0.19) -0.3006 0.1748
(0.00,0.09) -0.3798 0.1720
Table 3: The average tonality error TE( ji, ) derived from Table 2 and the standard
0.90~0.99 0.80~0.89 0.70~0.79 0.60~0.69 0.50~0.59 0.40~0.49 0.30~0.39 0.20~0.29 0.10~0.19 0~0.9
Real Tonality
In 3.2.2, the influence of aliasing effect on low power HE-AAC encoding quality is introduced. And the tonality adjustment based approach is proposed to eliminate aliasing effect in low power HE-AAC encoder.
3.2.2 Implementation
The tonality adjustment based HE-AAC encoder compensates the tonality error to the 64 subbands TR( tk, ) according to the compensation function CF(TR(k,t)) designed by the quantitative analysis result in 3.2.1.2. And the compensated tonality
)
))
The tonality adjustment based HE-AAC encoder can be designed as Figure 35.
Figure 35: Block diagram of Tonality Adjustment based Low Power HE-AAC encoder.
Compared with the spectrum of basic low power HE-AAC encoder in Figure 36, the tone located in 16300 Hz of Figure 37 is compensated perfectly. And the tone located in 13600 Hz in Figure 37 is compensated better than the tone in Figure 36.
Figure 36: The spectrum of conventional and basic low power HE-AAC encoder
Figure 37: The spectrum of conventional and tonality adjustment based low power HE-AAC encoder
3.2.3 Complexity
Due to the half encoding complexity of QMF banks and SBR encoder and no additional complexity in the procedure of tonality adjustment, the total complexity of Tonality Adjustment based low power HE-AAC encoder can be derived to 63 percent compared with conventional one shown in Table 4.
Complexity Type
QMF Banks
SBR Encoder
AAC Encoder
Total Complexity
Conventional 0.04 0.70 0.26 100%
Tonality Adjustment
Low Power 0.02 0.35 0.26 63%
Table 4: Complexity of conventional and low power HE-AAC encoder
3.2.4 Artifacts
Figure 37 clearly displays that the tonal spike artifact in 16300 Hz is eliminated due to the tonality compensation. But the noise overflow artifact in 13600 Hz still can not be eliminated completely. In 3.3, Complexification and Realification based low power HE-AAC encoder is proposed to avoid the aliasing effect for subbands’
tonality more efficiently.
3.3 Complexification and Realification based Low Power HE-AAC encoder
3.3.1 Complexification based Low Power HE-AAC encoder
As mentioned above, both real and complex QMF have the aliasing-free property. Compared with complex-value QMF, real-value QMF banks have to eliminate the aliasing terms by the mutual cancellation due to the presence of negative frequency. Therefore, Complexification is proposed to avoid the quality degradation in HF subbands by means of eliminating the aliasing terms within subbands.
3.3.1.1 Complexification
Figure 38 illustrates the architecture of complexification based low power HE-AAC encoder integrated with the tonality adjustment introduced in 3.2. Figure 39 shows the procedure of tonality adjustment and complexification.
Figure 38: HE-AAC encoder with complex QMF
Figure 39: Flow chart of the procedure for tone detection and aliasing elimination
According to the statistics about tonality error in Table 3, the complexification should be opened while the subband has a high tonality error TE[ ik, ] in order to avoid the aliasing effect by eliminating aliasing terms within the subband. Besides, the subband which does not have a high tonality error would be adjusted the tonality according to the approach in 3.2.
The impulse responses of exponential/cosine modulated analysis/synthesis QMF banks hk(n)and )fk(n are introduced in 3.1. And the equation (12)-(15) demonstrate the difference between the real QMF and complex QMF is on the imaginary part as (23) and (24).
)}
Complexification activates the computation of the imaginary part to remove the aliasing in high tonality error subbands.
Furthermore, the real-based correlation data for Levinson-Durbin algorithm [7]
calculated by the tonality estimator can be used in the complex-based correlation data for complex-based tonality estimation. Hence, tonality estimator does not result in extra complexity because the related data can be reused to estimate the complex tonality TC[ ik, ].
Figure 41 illustrates the quality improvement of the subband and demonstrates that the tonality measurement error can be avoided by complexification mechanism better than Tonality Adjustment based low power HE-AAC encoder.
Figure 40: The spectrum of conventional and Tonality Adjustment based low power HE-AAC encoder
Figure 41: The spectrum of conventional and Complexification based low power HE-AAC encoder: Type I
3.3.1.2 Complexity
In Table 5, it shows that the percentage of complexificated subband can not be more than 41.5%. That means the amount of complexificated subbands can not be more than 26 for every frame.
The major reason for the huge total complexity in Complexification based encoder is that the procedure to complexificate subbands does not have efficient fast algorithm. Therefore, Realification based low power HE-AAC encoder is proposed in the next section 3.3.2. It has the same encoding quality like Complexification based one, but it has much lower encoding complexity.
Complexity Type
QMF Banks
SBR Encoder
AAC Encoder
Total Complexity
Conventional 0.04 0.70 0.26 100%
Complexification 0.027+0.53x 0.35+0.35x 0.26 (63.7+88x)%
Table 5: Complexity of conventional and low power HE-AAC encoder x: Percentage of complexificated subbands
x Total Complexity 0% 63.7%
10% 72.5%
20% 81.3%
30% 98.9%
41.5% 100%
Table 6: Complexity of Complexification based low power HE-AAC encoder in terms of the percentage of complexificated subbands
3.3.2 Realification based Low Power HE-AAC encoder
3.3.2.1 Realification
In Realification based HE-AAC encoder, it decomposes the PCM samples just by complex-value QMF banks. And it reconstructs the LF signal by the real part of subbands through real-value synthesis QMF bank.
According to the tonality TR[ ik, ] calculated by the real part of subbands, the tonality error can be derived from Table 3. Then the encoder will calculate the imaginary tonality by the imaginary part of subbands while subbands has high tonality error.
Therefore, the procedure of calculating SBR parameters under Realification based SBR encoder shown in Figure 42 is the same as that under Complexification based SBR encoder. And the encoding quality is completely the same as that of Complexification based one.
Figure 42: Realification based low power HE-AAC encoder
Figure 43: Flow chart of the procedure of calculating SBR parameters
3.3.2.2 Complexity
In Table 7, it shows that the complexity of Complexification based encoder is more than that of Realification based one as long as the percentage of complexificated subband is more than 2.5%. And the encoding quality of Realification based HE-AAC encoder is as good as that of Complexification based one. Therefore, Realification based low power HE-AAC is better than Complexification based low power HE-AAC encoder in terms of complexity. And it can keep the encoding quality from degradation the same as Complexification based encoder.
Complexification 0.027+0.53x 0.35+0.35x 0.26 (63.7+88x)%
Realification 0.04 0.35+0.35x 0.26 (65+35x)%
Table 7: Complexity of conventional and low power HE-AAC encoder x: Percentage of complexificated subbands
Type x
Total Complexity in Complexification based encoder
Total Complexity in Realification based encoder
0% 63.7% 65%
Table 8: Complexity of Complexification and Realification based low power HE-AAC encoder in terms of the percentage of complexificated subbands
Chapter 4
Quality Assessment
4.1 Experiment Environment
Computer Status
Platform Personal Computer
Operating System Windows XP
CPU AMD Turion™ 64 X2 Mobile Technology TL-56 1.8 GHz
Memory 2 GB DDR2
Headphone
Amplifier Zen Class A Headphone Amplifier
Headphone AKG K-501
Objective Quality Measurement Tool
For objective quality evaluation, the thesis mainly adopts the PEAQ system (perceptual evaluation of audio quality) [8] which is the recommendation system by ITU-R Task Group 10/4. The system includes a subtle perceptual model to measure the difference between two tracks. The objective difference grade (ODG) is the output variable from the objective measurement method. The ODG values should range from 0 to −4, where 0 corresponds to an imperceptible impairment and −4 to impairment judged as very annoying. The improvement up to 0.1 is usually perceptually audible. The PEAQ has been widely used to measure the compression
For objective quality evaluation, the thesis mainly adopts the PEAQ system (perceptual evaluation of audio quality) [8] which is the recommendation system by ITU-R Task Group 10/4. The system includes a subtle perceptual model to measure the difference between two tracks. The objective difference grade (ODG) is the output variable from the objective measurement method. The ODG values should range from 0 to −4, where 0 corresponds to an imperceptible impairment and −4 to impairment judged as very annoying. The improvement up to 0.1 is usually perceptually audible. The PEAQ has been widely used to measure the compression