HF Adjustment for Tone-demand Grid

Chapter 2 Backgrounds

2.2 High Frequency Adjustment in SBR Decoder

2.2.2 HF Adjustment for Tone-demand Grid

If a high resolution grid needs additional tone added, it will be referred to as the tone-demand grid and adapts the second mode. The main difference of the two modes is the definition of the gain factor and compensation amount, but they are both controlled by the control parameter. The gain factor for the tone-demand grid is defined as

For the compensated tone, the energy amount is defined as

Besides the compensated tone is added into the middle subband in the k^th high resolution grid, the remained subbands are compensated with random noise with energy level defined in (3). Figure 6 illustrates the reconstructed high bands in the tone-demand grid.

( )

k^TD ²

Figure 6: HF adjustment process of a noise-demand high resolution grid with three subbands b₀,b₁,b₂.

Chapter 3 Control Parameter Extraction

This chapter presents the analysis for the accuracy of control parameter extraction, and a search method is also proposed to choose an optimal control parameter for several high resolution grids to share.

3.1 Problem Definition

According to the definition of HF adjustment in section 2.2, the scaling and compensation process are both controlled by the control parameter. It is obvious that the perceptual quality of the SBR audio mainly depends on the reconstructed tone-noise content. However, both the magnitude of tonal and noise compensation are controlled by the same control parameter in an individual noise grid, which has frequency resolution four times coarser than a high resolution grid. In other words, the sharing problem of control parameter can be considered as a trade-off between tonal and noise compensation as illustrated in Figure 7. Besides, the number and the location of the added tones are constrained by the SBR syntax. Hence, it is difficult to recover the exact T/N content.

↑

↓ , Cⁿ Ct

Value of Q

0 ∞

↓

↑ , Cⁿ Ct

↑

↓ , Cⁿ Ct

Value of Q

0 ∞

↓

↑ , Cⁿ Ct

Figure 7: Trade-off between tonal and noise compensation.

The related works such as in [2], has demonstrated the suggested method in the standard [4] only provide a roughly approximation of original TNR. Also, the optimal solution of control parameter for a single subband is proposed, but the sharing problem between noise-demand and tone-demand is neglected. Under the two considerations, instead of maintaining the TNR of original signal, our proposed method determines an optimal parameter to maintain the minimum distortion of the differences between the tone and noise floor component in dB domain.

In the next subsection, the reconstructed difference of dB value between tonal component and noise floor of the reconstructed high resolution grid under a chosen control parameter is formulated, and then an optimal parameter is decided by minimizing the distortion function. For convenience, the notations referring to as the energies of noise floor and tone component in the three phases are defined as follows.

Generation Phase

E : Energy of noise floor in the kk ^thhigh resolution grid without any adjustment.

E : k Energy of dominate tone component in the k^thhigh resolution grid without any adjustment.

Scaling Phase

E′ : Energy of noise floor in the kk ^thhigh resolution grid after gain scaling.

E′ : Energy of dominate tone component in the kk ^thhigh resolution grid after gain scaling.

Compensation Phase

E ′′ : Energy of noise floor in the kk ^thhigh resolution grid after compensation.

E ′′ : Energy of dominate tone component in the kk ^thhigh resolution grid after compensation.

3.2 Reconstructed T/N dB Difference

To measure the resultant dB difference of tonal component and noise floor in HF reconstruction, the associated changes in the three phases are simulated in this section.

3.2.1 T/N dB Difference for Noise-demand Grid

In the generation phase, E_kⁿ and E_k^t are estimated from the replicated low bands by linear prediction method that is explained in Chapter 4. After the scaling phase for the noise-demand grid, the energies of noise floor and tone component is modified as (6)(7) by the gain scaling factor.

( )

k^ND ²

Through the compensation phase, the T/N energies become

In (9), because of the added noise energy is relatively smaller compared to the tonal component energyE′_k^t, the effect of the adding compensation of noise can be ignored.

At last, from (8) and (9) the resultant difference ∆^ND_k between tonal component and noise floor of the reconstructed high resolution grid in dB domain is derived as (10) under the given control parameter Q_k.

( ) ( )

( )

kⁿ

The above T/N energy analysis through the three phases is summarized in Table 1.

Table 1: Summary of T/N energy analysis for noise-demand grid.

3 phase

Generation Scaling Compensation

Tonal energy ^t

E k

( )

k^ND ² Noise floor energy ⁿ

E k

( )

k^ND ²

3.2.2 T/N dB Difference for Tone-demand Grid

Similarly, by simulating the three phases, the resultant T/N difference for tone-demand grid can be estimated. After the gain scaling phase for the tone-demand grid, the energy of noise floor is modified as (12) by the gain factor.

( )

k^TD ²

The tonal component obtained in the replicated low bands is ignored due to the assumption that the compensated tone is more dominated. After the compensation phase, the tonal component is revised as

t k t

k C

E′′ = . (13)

As mentioned in section 2.2, the subbands without additional tone will be compensated with random noise. Hence, the noise floor becomes

From (13) and (14), the resultant difference ∆^TD_k between tonal component and noise floor of the reconstructed high resolution grid in dB domain is derived as (15) under the given control parameter Q_k.

Substituting (4), (5) and (12)~(14) into (15) yields

( )

r dB

( )

The above T/N energy analysis through the three phases is summarized in Table 2.

Table 2: Summary of T/N energy analysis for tone-demand grid.

3 phase

Generation Scaling Compensation

Tonal energy neglected neglected E_k′′^t =C_k^t

3.3 Optimal Solution

The measurement of the reconstructed T/N dB difference in a high resolution grid is formulated in the above section under the given control parameter Q_k. On the other hand, if the T/N dB difference of original signal in a high resolution grid is given, the local optimal control parameter Q_k can be derived from (17) or (18). The relation between control parameter and T/N dB difference is summarized in Table 3 and Table 4.

Table 3: Relation between the given control parameter and the resultant T/N dB difference.

GivenQ , the resultant T/N dB difference: _k

Noise-demand

( ) ( ) (

k k^r

)

Table 4: Relation between the given T/N dB difference and the local optimal control parameter.

Given∆ , the local optimal control parameter: _k

Noise-demand

( )

^⎟⎟_⎠

In order to determine the optimal control parameter, the total distortion of the resultant dB differences in an individual noise grid is formulated as (19) under a common control parameter Q.

( ) _∑ ( ( ) ) _∑ ( ( ) )

where ∆_k is the actual dB difference measured from the original HF subbands in the k^thhigh resolution grid. Hence, the optimal control parameter should be chosen to minimize the distortion function

[ ( ) ]

{

^D ^Q

}

Arg

Q^* = minQ . (20)

According to the syntax of SBR, there are 31 candidates for the value of Q, and thus the optimal value can be found from the 31 possibilities. Substituting (11) and (16) into (19) results in

( )

where R is defined as (23), and |NG| is the number of all the high resolution grids in the noise grid.

( )

Hence the optimal control parameter Q^* can be approximated by

[ ( ) ]

By searching the neighborhood of Q~, it gives a faster method to find the optimal control parameter Q^* without covering the 31 candidates.

Chapter 4 Tonality Measurement

From Chapter 3, the extraction of the optimal control parameter significantly depends on the accurate estimation of the tonal component and noise-floor in the subbands. The improper measurement will result in unsuitable parameter and consequently may cause artifacts in the reconstructed HF spectrum. In this chapter, an efficient and accurate method for TNR measurement based on linear prediction approach is proposed to improve the standard second order predictor. Further more, considered the TNR will be affected by the inverse filtering process, an efficient method for the advanced TNR measurement is also proposed.

4.1 Scheme in 3GPP

A simple linear predictor of second order is suggested to measure the TNR of subband signals by the standard [4][5][6]. The prediction coefficients are calculated according to the covariance matrix. However, in order to precisely measure the actual TNR, the poles of the linear prediction filter must match the number of tones contained in the subband, and thus the prediction order should equal to the number of tones. If a subband contains more than two tones, it is obvious that the second order predictor would not be sufficient to capture all predictable components and cause missed detection. On the other hand, for a subband containing less than two tones, some noise components may be captured as predictable components by the second order predictor and start a false alarm. In the case of missed detection, the tonal energy is underestimated and the noise floor energy is overestimate, which will cause noise overflow in HF reconstruction. Oppositely, in the case of false alarm, the tonal energy is overestimated and the noise floor energy is underestimated, which may cause tonal spike. These typical artifacts in HE-AAC codec will be discussed in Chapter 5.

4.2 Modified Levinson-Durbin Algorithm

The proposed method is based on Levinson-Durbin algorithm that can construct a lattice filter incrementally until the demand prediction order is achieved. Because the tone component can be captured as the predictable components, the adaptive prediction order can be obtained when the prediction error does not decrease much with the increasing of the prediction order. The MLD algorithm is summarized as follows, and the flow chart of MLD algorithm is shown in Figure 8.

MLD Algorithm Notation used in the algorithm:

(a) φ( ji, ): autocorrelation of subband signal x[n], n=0~N.

(b) p : the adaptive prediction order.

(d) a : _i^p i prediction coefficient at order p. ^th

The algorithm performs following steps for every subband:

Step 1: Initial conditions.

Step 2: Increase order.

← p

p .

Step 3: Compute coefficients and prediction error.

I. _⎥

Step 4: Check terminal condition.

if ⁻ _>_Ψ

Set Initial Condition

Increase Prediction Order

Compute

Initial Condition Set Initial Condition

Increase Prediction Order

Compute

Figure 8: Flow chart of the MLD algorithm

4.3 Tonality Measurement of Inverse Filtered Signal

Inverse filtering is a HF whitening process in the decoder which is formulated in (1). The energy of tonal component in the replicated low bands will be eliminated with a certain degree that controlled by the chirp factor. Since the TNR will be changed after inverse filtering, the tonality measurement should be re-estimated to obtain the actual TNR. Otherwise, the unmatched TNR to the inverse filtered signal will consequently result in improper compensation. The following method provides an efficient approach to measure the TNR of inverse filtered signal without performing the inverse filtering process on time domain signal.

From previous section, given the autocorrelation matrix of subband signal, the MLD algorithm can measure the TNR and search the adaptive order automatically.

Through the calculation for autocorrelation of inverse filtered signal, the inverse filtered TNR can be obtained by applying the MLD algorithm again. The autocorrelation of inverse filtered signal in b^th subband is defined as

1 *

By substituting (1) into (26) results in

( ) [ ] [ ] [ ] [ ]

where p^th subband in LF is mapped to b^th subband in HF. Since the autocorrelation of inverse filtered signal is obtained from (27), the tonality measurement of inverse filtered signal can be done by applying the MLD algorithm again, with φ_b

( )

i,j as its input.

4.4 T/N dB Difference Measurement

According to the result of the MLD algorithm, the tonal component can be measured from the predictable energy due to the stationary property. For the b^th subband in the scope of MLD, its tone component is defined as

( )

p e e

p t

=φ 0,0 −

, (28)

where φ

( )

0,0 −e^pis the predictable energy and p is the number of tones in the subband. In other words, (28) estimates the average energy of tones in the subband.

Furthermore, the tone component for a high resolution grid is derived from the maximum among the tone component of the subbands.

{

^e ^b ^k

}

E_k^t =max _b^t | ∈ . (29)

On the other hand, the prediction error can be referred to as the noise content due to the random property. Hence, the noise floor for the b^thsubband in the scope of MLD can be obtained from

p N e e

p n

b = − , (30)

where N is the number of the subband samples in the scope of MLD. Also, the noise floor for a high resolution grid is derived from the average noise floor among the subbands in (31).

∑

∈

⋅

k b

n k n

k e

E k1

. (31)

Finally, the T/N dB difference used in Chapter 3 is considered as the dB value difference between the dominate tone and the average noise floor within a high resolution grid.

( )

_k^t _dB

( )

_kⁿ

dB E f E

f −

k =

∆ . (32)

Chapter 5 Artifacts

This chapter discussed about two typical artifacts in HE-AAC codec, noise overflow and tonal spike. The causes leading to the artifacts are pointed out, and a remedial method at the encoder end to prevent the noise overflow phenomenon in HF reconstruction is proposed.

5.1 Noise Overflow

As mentioned in Chapter 4, the missed detection of tonality measurement means the energy of tonal component is underestimated and the energy of noise component is overestimated. In this case, the inaccuracy of TNR measurement will consequently cause lesser tonal or overmuch noise compensation, which all leading to the noise overflow phenomenon in HF reconstruction that producing a fizzy sound perceptually.

The noise overflow phenomenon is illustrated in Figure 9.

(a) A spectrum with noise overflow

phenomenon. (b) Spectrogram comparison for noise overflow phenomenon.

Figure 9: Noise overflow phenomenon

5.2 Tonal Spike

In the case for the false alarm of tonality measurement, the energy of tonal component is overestimated and the energy of noise component is underestimated.

The inaccuracy of TNR measurement will consequently cause overmuch tonal or lesser noise compensation, which both leading to the tonal spike appearance in HF reconstruction that producing a metallic sound perceptually. The tonal spike phenomenon is illustrated in Figure 10.

(a) A spectrum with tonal spike phenomenon. (b) Spectrogram comparison for tonal spike phenomenon.

Figure 10: Tonal spike phenomenon

5.3 Noise Floor Correction

Noise floor correction is an alternative approach at the encoder end. It provides a remedial method to reduce the noise overflow phenomenon under the determined control parameter and compensation selection. Through the three phase simulation in Chapter 3, the noise floor energy N_k^r after HF adjustment can be obtained from (8) or (14) as shown in Table 5. Since the control parameter Q is determined, the noise floor energy in HF reconstruction can be considered as a function of E_k^o. The noise overflow detection can be done by comparing with the original noise floor. Also, from (11) and (16) in Table 3, it has shown that changing the averaged energy of original high bands in k^th high resolution grid (E_k^o) will not affect the reconstructed T/N dB difference. Hence, the noise floor correction can be done by modifying E_k^o. The following evaluation is based on energy grids.

Table 5: Noise floor energy in HF reconstruction.

Noise floor energy in HF reconstruction

Noise-demand

( )

r k^o

To correct the noise floor energy in HF reconstruction, the averaged energy of original high bands E_k^o is modified by a revision factor R, which means replace E_k^o with E_k^o⋅R in (33) and (34). The resultant noise floor energy should equal to the original noise floor energy N^o according to

( ) ( )

The revision factor R can be obtained by substituting (33), (34) into (35) as following

If R~ ≅1, it means that the reconstructed noise floor does not differ from the original noise floor largely. On the other hand, if R~ <<1, it indicates the noise overflow phenomenon, hence the revision factor is applied.

Chapter 6 Experimental Results

In this chapter, a large amount of experiments are conducted for verifying our proposed approaches based on the MPEG test tracks and the music database collected in our lab. The experiments include both objective quality measurement and subjective measurement.

6.1 Experiment Environment

Computer Status:

Platform Personal Computer

Operating System Windows XP

CPU Intel Pentium 4 2.4GHz

Memory 256MB DDR400 * 2

Mother Board ASUS P4P800

Sound Card ADI AD1985 AC' 97

Headphone ALESSANDRO MUSIC SERIES PRO

Objective Quality Measurement Tool:

For objective quality evaluation, the thesis mainly adopts the PEAQ system (perceptual evaluation of audio quality) [16] which is the recommendation system by ITU-R Task Group 10/4. The system includes a subtle perceptual model to measure the difference between two tracks. The objective difference grade (ODG) is the output variable from the objective measurement method. The ODG values should range from 0 to −4, where 0 corresponds to an imperceptible impairment and −4 to impairment judged as very annoying. The improvement up to 0.1 is usually perceptually audible.

The PEAQ has been widely used to measure the compression technique due to the capability to detect perceptual difference sensible by human hearing systems.

Subjective Quality Measurement Tool:

For subjective quality evaluation, the thesis mainly adopts the MUSHRA system [17]. The system allows the blind comparison of multiple audio files. Multi stimulus test with hidden reference and anchors has been designed to give a reliable and repeatable measure of the audio quality of intermediate-quality signals. MUSHRA has the advantage that it provides an absolute measure of the audio quality of a codec which can be compared directly with the reference. MUSHRA follows the test method and impairment scale recommended by ITU-R BS.1116 [18].

6.2 Objective Quality Measurement in MPEG Test Tracks

The twelve test tracks recommended by MPEG are shown in Table 6. These tracks include the critical music balancing on the percussion, string, wind instruments, and human vocal. In this section, the quality enhancement of proposed methods at different bit rates is verified based on these MPEG test tracks and NCTU-HEAAC [19]

is adopted as the platform.

Table 6: The twelve tracks recommended by MPEG Signal Description

Tracks

Signals Mode Time (sec) Remark

1 es01 Vocal (Suzan Vega) stereo 10 (c)

2 es02 German speech stereo 8 (c)

3 es03 English speech stereo 7 (c)

4 sc01 Trumpet solo and orchestra stereo 10 (b) (d)

5 sc02 Orchestral piece stereo 12 (d)

6 sc03 Contemporary pop music stereo 11 (d)

7 si01 Harpsichord stereo 7 (b)

8 si02 Castanets stereo 7 (a)

9 si03 pitch pipe stereo 27 (b)

10 sm01 Bagpipes stereo 11 (b)

11 sm02 Glockenspiel stereo 10 (a) (b)

12 sm03 Plucked strings stereo 13 (a) (b)

Remarks:

(a) Transients: pre-echo sensitive, smearing of noise in temporal domain.

(b) Tonal/Harmonic structure: noise sensitive, roughness.

(d) Complex sound: stresses the device under test.

Table 7: ODG for proposed methods on MPEG test tracks at bit rate 80 kbps.

Codec NCTU-HEAAC

Bit Rate 80 kbps

Tracks M0 M1 M2 M3

es01 -0.74 -0.74 -0.73 -0.73

es02 -0.65 -0.66 -0.65 -0.65

es03 -0.74 -0.74 -0.75 -0.75

sc01 -1.05 -1.05 -1.01 -1.01

sc02 -1.28 -1.28 -1.25 -1.25

sc03 -1.21 -1.21 -1.18 -1.19

si01 -1.67 -1.67 -1.65 -1.65

si02 -1.01 -1.01 -1.03 -1.03

si03 -1.69 -1.69 -1.64 -1.64

sm01 -1.63 -1.64 -1.6 -1.6

sm02 -1.71 -1.62 -1.6 -1.6

sm03 -1.41 -1.42 -1.34 -1.35

Max -0.65 -0.66 -0.65 -0.65

Min -1.71 -1.69 -1.65 -1.65

Average -1.2325 -1.2275 -1.2025 -1.20417 M0: HF reconstruction without compensation

M1: M0 + noise floor correction

M2: HF reconstruction with tone/noise compensation M3: M2 + noise floor correction

ODG at bit rate 80kbps

-1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

ODG

M0 M1 M2 M3

Figure 11: ODG comparison of MPEG test tracks at bit rate 80kbps.

Table 8: ODG for proposed methods on MPEG test tracks at bit rate 64 kbps.

Codec NCTU-HEAAC

Bit Rate 64 kbps

Tracks M0 M1 M2 M3

es01 -1.02 -1.03 -1 -1.01

es02 -0.89 -0.89 -0.87 -0.87

es03 -1.01 -1.03 -1.03 -1.03

sc01 -1.7 -1.7 -1.66 -1.66

sc02 -1.77 -1.78 -1.74 -1.74

sc03 -1.67 -1.68 -1.63 -1.63

si01 -2 -2 -2 -1.99

si02 -1.5 -1.5 -1.57 -1.57

si03 -2.16 -2.16 -2.09 -2.09

sm01 -2.18 -2.19 -2.17 -2.17

sm02 -2.45 -2.42 -2.51 -2.51

sm03 -1.8 -1.81 -1.73 -1.72

Max -0.89 -0.89 -0.87 -0.87

Min -2.45 -2.42 -2.51 -2.51

Average -1.67917 -1.6825 -1.66667 -1.66583 M0: HF reconstruction without compensation

M1: M0 + noise floor correction

M2: HF reconstruction with tone/noise compensation M3: M2 + noise floor correction

ODG at bit rate 64kbps

-3 -2.5 -2 -1.5 -1 -0.5 0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

ODG

M0 M1 M2 M3

Figure 12: ODG comparison of MPEG test tracks at bit rate 64kbps.

Table 9: ODG for proposed methods on MPEG test tracks at bit rate 48 kbps

Codec NCTU-HEAAC

Bit Rate 48 kbps

Tracks M0 M1 M2 M3

es01 -1.58 -1.59 -1.56 -1.61

es02 -1.42 -1.4 -1.39 -1.42

es03 -1.71 -1.71 -1.71 -1.7

sc01 -2.46 -2.46 -2.4 -2.4

sc02 -2.64 -2.63 -2.59 -2.62

sc03 -2.41 -2.41 -2.33 -2.33

si01 -2.73 -2.73 -2.71 -2.74

si02 -2.4 -2.38 -2.48 -2.46

si03 -3.21 -3.21 -3.17 -3.17

sm01 -3.24 -3.24 -3.2 -3.2

sm02 -3.4 -3.41 -3.37 -3.38

sm03 -2.36 -2.37 -2.26 -2.26

Max -1.42 -1.4 -1.39 -1.42

Min -3.4 -3.41 -3.37 -3.38

Average -2.46333 -2.46167 -2.43083 -2.44083 M0: HF reconstruction without compensation

M1: M0 + noise floor correction

M2: HF reconstruction with tone/noise compensation M3: M2 + noise floor correction

ODG at bit rate 48kbps

-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

ODG

M0 M1 M2 M3

Figure 13: ODG comparison of MPEG test tracks at bit rate 48kbps.

As the results shows, the quality of HF reconstruction in SBR audio is improved by proper tone/noise compensation at different bit rate, except es03, si02 and sm02. The subjective test will be conducted later to check if the quality is degraded or improved.

在文檔中 MPEG-4 HE-AAC 中之高頻調整模組設計 (頁 16-0)

Chapter 2 Backgrounds

2.2 High Frequency Adjustment in SBR Decoder

2.2.2 HF Adjustment for Tone-demand Grid

( )

Chapter 3

Control Parameter Extraction

3.1 Problem Definition

3.2 Reconstructed T/N dB Difference

( )

( ) ( )

( )

( )

( )

( )

( )

( )

3.3 Optimal Solution

( ) ( ) (

)

( )

( ) ∑ ( ( ) ) ∑ ( ( ) )

[ ( ) ]

{

}

( )

( )

[ ( ) ]

Chapter 4

Tonality Measurement

4.1 Scheme in 3GPP

4.2 Modified Levinson-Durbin Algorithm

4.3 Tonality Measurement of Inverse Filtered Signal

( )

4.4 T/N dB Difference Measurement

( )

( )

{

}

∑

( )

( )

Chapter 5 Artifacts

5.1 Noise Overflow

5.2 Tonal Spike

5.3 Noise Floor Correction

( )

( ) ( )

Chapter 6

Experimental Results

6.1 Experiment Environment

6.2 Objective Quality Measurement in MPEG Test Tracks

( ) _∑ ( ( ) ) _∑ ( ( ) )