Chapter 2 Backgrounds
2.2 High Frequency Adjustment in SBR Decoder
2.2.2 HF Adjustment for Tone-demand Grid
If a high resolution grid needs additional tone added, it will be referred to as the tone-demand grid and adapts the second mode. The main difference of the two modes is the definition of the gain factor and compensation amount, but they are both controlled by the control parameter. The gain factor for the tone-demand grid is defined as
k
For the compensated tone, the energy amount is defined as
k
Besides the compensated tone is added into the middle subband in the kth high resolution grid, the remained subbands are compensated with random noise with energy level defined in (3). Figure 6 illustrates the reconstructed high bands in the tone-demand grid.
( )
kTD 2Figure 6: HF adjustment process of a noise-demand high resolution grid with three subbands b0,b1,b2.
8
Chapter 3
Control Parameter Extraction
This chapter presents the analysis for the accuracy of control parameter extraction, and a search method is also proposed to choose an optimal control parameter for several high resolution grids to share.
3.1 Problem Definition
According to the definition of HF adjustment in section 2.2, the scaling and compensation process are both controlled by the control parameter. It is obvious that the perceptual quality of the SBR audio mainly depends on the reconstructed tone-noise content. However, both the magnitude of tonal and noise compensation are controlled by the same control parameter in an individual noise grid, which has frequency resolution four times coarser than a high resolution grid. In other words, the sharing problem of control parameter can be considered as a trade-off between tonal and noise compensation as illustrated in Figure 7. Besides, the number and the location of the added tones are constrained by the SBR syntax. Hence, it is difficult to recover the exact T/N content.
↑
↓ , Cn Ct
Value of Q
0 ∞
↓
↑ , Cn Ct
↑
↓ , Cn Ct
Value of Q
0 ∞
↓
↑ , Cn Ct
Figure 7: Trade-off between tonal and noise compensation.
The related works such as in [2], has demonstrated the suggested method in the standard [4] only provide a roughly approximation of original TNR. Also, the optimal solution of control parameter for a single subband is proposed, but the sharing problem between noise-demand and tone-demand is neglected. Under the two considerations, instead of maintaining the TNR of original signal, our proposed method determines an optimal parameter to maintain the minimum distortion of the differences between the tone and noise floor component in dB domain.
In the next subsection, the reconstructed difference of dB value between tonal component and noise floor of the reconstructed high resolution grid under a chosen control parameter is formulated, and then an optimal parameter is decided by minimizing the distortion function. For convenience, the notations referring to as the energies of noise floor and tone component in the three phases are defined as follows.
Generation Phase
n
E : Energy of noise floor in the kk th high resolution grid without any adjustment.
t
E : k Energy of dominate tone component in the kth high resolution grid without any adjustment.
Scaling Phase
n
E′ : Energy of noise floor in the kk th high resolution grid after gain scaling.
t
E′ : Energy of dominate tone component in the kk th high resolution grid after gain scaling.
Compensation Phase
n
E ′′ : Energy of noise floor in the kk th high resolution grid after compensation.
t
E ′′ : Energy of dominate tone component in the kk th high resolution grid after compensation.
10
3.2 Reconstructed T/N dB Difference
To measure the resultant dB difference of tonal component and noise floor in HF reconstruction, the associated changes in the three phases are simulated in this section.
3.2.1 T/N dB Difference for Noise-demand Grid
In the generation phase, Ekn and Ekt are estimated from the replicated low bands by linear prediction method that is explained in Chapter 4. After the scaling phase for the noise-demand grid, the energies of noise floor and tone component is modified as (6)(7) by the gain scaling factor.
( )
kND 2Through the compensation phase, the T/N energies become
n
In (9), because of the added noise energy is relatively smaller compared to the tonal component energyE′kt, the effect of the adding compensation of noise can be ignored.
At last, from (8) and (9) the resultant difference ∆NDk between tonal component and noise floor of the reconstructed high resolution grid in dB domain is derived as (10) under the given control parameter Qk.
( ) ( )
dB( )
knThe above T/N energy analysis through the three phases is summarized in Table 1.
Table 1: Summary of T/N energy analysis for noise-demand grid.
3 phase
Generation Scaling Compensation
Tonal energy t
E k
( )
kND 2 Noise floor energy nE k
( )
kND 23.2.2 T/N dB Difference for Tone-demand Grid
Similarly, by simulating the three phases, the resultant T/N difference for tone-demand grid can be estimated. After the gain scaling phase for the tone-demand grid, the energy of noise floor is modified as (12) by the gain factor.
( )
kTD 2The tonal component obtained in the replicated low bands is ignored due to the assumption that the compensated tone is more dominated. After the compensation phase, the tonal component is revised as
t k t
k C
E′′ = . (13)
As mentioned in section 2.2, the subbands without additional tone will be compensated with random noise. Hence, the noise floor becomes
n
From (13) and (14), the resultant difference ∆TDk between tonal component and noise floor of the reconstructed high resolution grid in dB domain is derived as (15) under the given control parameter Qk.
Substituting (4), (5) and (12)~(14) into (15) yields
( )
r dB( )
kThe above T/N energy analysis through the three phases is summarized in Table 2.
Table 2: Summary of T/N energy analysis for tone-demand grid.
3 phase
Generation Scaling Compensation
Tonal energy neglected neglected Ek′′t =Ckt
12
3.3 Optimal Solution
The measurement of the reconstructed T/N dB difference in a high resolution grid is formulated in the above section under the given control parameter Qk. On the other hand, if the T/N dB difference of original signal in a high resolution grid is given, the local optimal control parameter Qk can be derived from (17) or (18). The relation between control parameter and T/N dB difference is summarized in Table 3 and Table 4.
Table 3: Relation between the given control parameter and the resultant T/N dB difference.
GivenQ , the resultant T/N dB difference: k
Noise-demand
( ) ( ) (
k kr)
Table 4: Relation between the given T/N dB difference and the local optimal control parameter.
Given∆ , the local optimal control parameter: k
Noise-demand
( )
⎟⎟⎠In order to determine the optimal control parameter, the total distortion of the resultant dB differences in an individual noise grid is formulated as (19) under a common control parameter Q.
( ) ∑ ( ( ) ) ∑ ( ( ) )
where ∆k is the actual dB difference measured from the original HF subbands in the kth high resolution grid. Hence, the optimal control parameter should be chosen to minimize the distortion function
[ ( ) ]
{
D Q}
Arg
Q* = minQ . (20)
According to the syntax of SBR, there are 31 candidates for the value of Q, and thus the optimal value can be found from the 31 possibilities. Substituting (11) and (16) into (19) results in
( )
where R is defined as (23), and |NG| is the number of all the high resolution grids in the noise grid.
( )
Hence the optimal control parameter Q* can be approximated by
[ ( ) ]
By searching the neighborhood of Q~, it gives a faster method to find the optimal control parameter Q* without covering the 31 candidates.
14
Chapter 4
Tonality Measurement
From Chapter 3, the extraction of the optimal control parameter significantly depends on the accurate estimation of the tonal component and noise-floor in the subbands. The improper measurement will result in unsuitable parameter and consequently may cause artifacts in the reconstructed HF spectrum. In this chapter, an efficient and accurate method for TNR measurement based on linear prediction approach is proposed to improve the standard second order predictor. Further more, considered the TNR will be affected by the inverse filtering process, an efficient method for the advanced TNR measurement is also proposed.
4.1 Scheme in 3GPP
A simple linear predictor of second order is suggested to measure the TNR of subband signals by the standard [4][5][6]. The prediction coefficients are calculated according to the covariance matrix. However, in order to precisely measure the actual TNR, the poles of the linear prediction filter must match the number of tones contained in the subband, and thus the prediction order should equal to the number of tones. If a subband contains more than two tones, it is obvious that the second order predictor would not be sufficient to capture all predictable components and cause missed detection. On the other hand, for a subband containing less than two tones, some noise components may be captured as predictable components by the second order predictor and start a false alarm. In the case of missed detection, the tonal energy is underestimated and the noise floor energy is overestimate, which will cause noise overflow in HF reconstruction. Oppositely, in the case of false alarm, the tonal energy is overestimated and the noise floor energy is underestimated, which may cause tonal spike. These typical artifacts in HE-AAC codec will be discussed in Chapter 5.
4.2 Modified Levinson-Durbin Algorithm
The proposed method is based on Levinson-Durbin algorithm that can construct a lattice filter incrementally until the demand prediction order is achieved. Because the tone component can be captured as the predictable components, the adaptive prediction order can be obtained when the prediction error does not decrease much with the increasing of the prediction order. The MLD algorithm is summarized as follows, and the flow chart of MLD algorithm is shown in Figure 8.
MLD Algorithm Notation used in the algorithm:
(a) φ( ji, ): autocorrelation of subband signal x[n], n=0~N.
(b) p : the adaptive prediction order.
(c) pmax: maximum number of prediction order.
(d) a : ip i prediction coefficient at order p. th
The algorithm performs following steps for every subband:
Step 1: Initial conditions.
=1
Step 2: Increase order.
+1
← p
p .
Step 3: Compute coefficients and prediction error.
I. ⎥
Step 4: Check terminal condition.
if − >Ψ
16
Set Initial Condition
Set Initial Condition
Increase Prediction Order
Increase Prediction Order
Compute
Initial Condition Set Initial Condition
Increase Prediction Order
Increase Prediction Order
Compute
Figure 8: Flow chart of the MLD algorithm
4.3 Tonality Measurement of Inverse Filtered Signal
Inverse filtering is a HF whitening process in the decoder which is formulated in (1). The energy of tonal component in the replicated low bands will be eliminated with a certain degree that controlled by the chirp factor. Since the TNR will be changed after inverse filtering, the tonality measurement should be re-estimated to obtain the actual TNR. Otherwise, the unmatched TNR to the inverse filtered signal will consequently result in improper compensation. The following method provides an efficient approach to measure the TNR of inverse filtered signal without performing the inverse filtering process on time domain signal.
From previous section, given the autocorrelation matrix of subband signal, the MLD algorithm can measure the TNR and search the adaptive order automatically.
Through the calculation for autocorrelation of inverse filtered signal, the inverse filtered TNR can be obtained by applying the MLD algorithm again. The autocorrelation of inverse filtered signal in bth subband is defined as
1 *
By substituting (1) into (26) results in
( ) [ ] [ ] [ ] [ ]
where pth subband in LF is mapped to bth subband in HF. Since the autocorrelation of inverse filtered signal is obtained from (27), the tonality measurement of inverse filtered signal can be done by applying the MLD algorithm again, with φb
( )
i,j as its input.4.4 T/N dB Difference Measurement
According to the result of the MLD algorithm, the tonal component can be measured from the predictable energy due to the stationary property. For the bth subband in the scope of MLD, its tone component is defined as
( )
p e e
p t
b
=φ 0,0 −
, (28)
where φ
( )
0,0 −epis the predictable energy and p is the number of tones in the subband. In other words, (28) estimates the average energy of tones in the subband.Furthermore, the tone component for a high resolution grid is derived from the maximum among the tone component of the subbands.
{
e b k}
Ekt =max bt | ∈ . (29)
On the other hand, the prediction error can be referred to as the noise content due to the random property. Hence, the noise floor for the bth subband in the scope of MLD can be obtained from
p N e e
p n
b = − , (30)
where N is the number of the subband samples in the scope of MLD. Also, the noise floor for a high resolution grid is derived from the average noise floor among the subbands in (31).
∑
∈⋅
=
k b
n k n
k e
E k1
. (31)
Finally, the T/N dB difference used in Chapter 3 is considered as the dB value difference between the dominate tone and the average noise floor within a high resolution grid.
( )
kt dB( )
kndB E f E
f −
k =
∆ . (32)
18
Chapter 5 Artifacts
This chapter discussed about two typical artifacts in HE-AAC codec, noise overflow and tonal spike. The causes leading to the artifacts are pointed out, and a remedial method at the encoder end to prevent the noise overflow phenomenon in HF reconstruction is proposed.
5.1 Noise Overflow
As mentioned in Chapter 4, the missed detection of tonality measurement means the energy of tonal component is underestimated and the energy of noise component is overestimated. In this case, the inaccuracy of TNR measurement will consequently cause lesser tonal or overmuch noise compensation, which all leading to the noise overflow phenomenon in HF reconstruction that producing a fizzy sound perceptually.
The noise overflow phenomenon is illustrated in Figure 9.
(a) A spectrum with noise overflow
phenomenon. (b) Spectrogram comparison for noise overflow phenomenon.
Figure 9: Noise overflow phenomenon
5.2 Tonal Spike
In the case for the false alarm of tonality measurement, the energy of tonal component is overestimated and the energy of noise component is underestimated.
The inaccuracy of TNR measurement will consequently cause overmuch tonal or lesser noise compensation, which both leading to the tonal spike appearance in HF reconstruction that producing a metallic sound perceptually. The tonal spike phenomenon is illustrated in Figure 10.
(a) A spectrum with tonal spike phenomenon. (b) Spectrogram comparison for tonal spike phenomenon.
Figure 10: Tonal spike phenomenon
20
5.3 Noise Floor Correction
Noise floor correction is an alternative approach at the encoder end. It provides a remedial method to reduce the noise overflow phenomenon under the determined control parameter and compensation selection. Through the three phase simulation in Chapter 3, the noise floor energy Nkr after HF adjustment can be obtained from (8) or (14) as shown in Table 5. Since the control parameter Q is determined, the noise floor energy in HF reconstruction can be considered as a function of Eko. The noise overflow detection can be done by comparing with the original noise floor. Also, from (11) and (16) in Table 3, it has shown that changing the averaged energy of original high bands in kth high resolution grid (Eko) will not affect the reconstructed T/N dB difference. Hence, the noise floor correction can be done by modifying Eko. The following evaluation is based on energy grids.
Table 5: Noise floor energy in HF reconstruction.
Noise floor energy in HF reconstruction
Noise-demand
( )
r koTo correct the noise floor energy in HF reconstruction, the averaged energy of original high bands Eko is modified by a revision factor R, which means replace Eko with Eko⋅R in (33) and (34). The resultant noise floor energy should equal to the original noise floor energy No according to
( ) ( )
The revision factor R can be obtained by substituting (33), (34) into (35) as following
If R~ ≅1, it means that the reconstructed noise floor does not differ from the original noise floor largely. On the other hand, if R~ <<1, it indicates the noise overflow phenomenon, hence the revision factor is applied.
22
Chapter 6
Experimental Results
In this chapter, a large amount of experiments are conducted for verifying our proposed approaches based on the MPEG test tracks and the music database collected in our lab. The experiments include both objective quality measurement and subjective measurement.
6.1 Experiment Environment
Computer Status:
Platform Personal Computer
Operating System Windows XP
CPU Intel Pentium 4 2.4GHz
Memory 256MB DDR400 * 2
Mother Board ASUS P4P800
Sound Card ADI AD1985 AC' 97
Headphone ALESSANDRO MUSIC SERIES PRO
Objective Quality Measurement Tool:
For objective quality evaluation, the thesis mainly adopts the PEAQ system (perceptual evaluation of audio quality) [16] which is the recommendation system by ITU-R Task Group 10/4. The system includes a subtle perceptual model to measure the difference between two tracks. The objective difference grade (ODG) is the output variable from the objective measurement method. The ODG values should range from 0 to −4, where 0 corresponds to an imperceptible impairment and −4 to impairment judged as very annoying. The improvement up to 0.1 is usually perceptually audible.
The PEAQ has been widely used to measure the compression technique due to the capability to detect perceptual difference sensible by human hearing systems.
Subjective Quality Measurement Tool:
For subjective quality evaluation, the thesis mainly adopts the MUSHRA system [17]. The system allows the blind comparison of multiple audio files. Multi stimulus test with hidden reference and anchors has been designed to give a reliable and repeatable measure of the audio quality of intermediate-quality signals. MUSHRA has the advantage that it provides an absolute measure of the audio quality of a codec which can be compared directly with the reference. MUSHRA follows the test method and impairment scale recommended by ITU-R BS.1116 [18].
24
6.2 Objective Quality Measurement in MPEG Test Tracks
The twelve test tracks recommended by MPEG are shown in Table 6. These tracks include the critical music balancing on the percussion, string, wind instruments, and human vocal. In this section, the quality enhancement of proposed methods at different bit rates is verified based on these MPEG test tracks and NCTU-HEAAC [19]
is adopted as the platform.
Table 6: The twelve tracks recommended by MPEG Signal Description
Tracks
Signals Mode Time (sec) Remark
1 es01 Vocal (Suzan Vega) stereo 10 (c)
2 es02 German speech stereo 8 (c)
3 es03 English speech stereo 7 (c)
4 sc01 Trumpet solo and orchestra stereo 10 (b) (d)
5 sc02 Orchestral piece stereo 12 (d)
6 sc03 Contemporary pop music stereo 11 (d)
7 si01 Harpsichord stereo 7 (b)
8 si02 Castanets stereo 7 (a)
9 si03 pitch pipe stereo 27 (b)
10 sm01 Bagpipes stereo 11 (b)
11 sm02 Glockenspiel stereo 10 (a) (b)
12 sm03 Plucked strings stereo 13 (a) (b)
Remarks:
(a) Transients: pre-echo sensitive, smearing of noise in temporal domain.
(b) Tonal/Harmonic structure: noise sensitive, roughness.
(c) Natural vocal (critical combination of tonal parts and attacks): distortion sensitive, smearing of attacks.
(d) Complex sound: stresses the device under test.
Table 7: ODG for proposed methods on MPEG test tracks at bit rate 80 kbps.
Codec NCTU-HEAAC
Bit Rate 80 kbps
Tracks M0 M1 M2 M3
es01 -0.74 -0.74 -0.73 -0.73
es02 -0.65 -0.66 -0.65 -0.65
es03 -0.74 -0.74 -0.75 -0.75
sc01 -1.05 -1.05 -1.01 -1.01
sc02 -1.28 -1.28 -1.25 -1.25
sc03 -1.21 -1.21 -1.18 -1.19
si01 -1.67 -1.67 -1.65 -1.65
si02 -1.01 -1.01 -1.03 -1.03
si03 -1.69 -1.69 -1.64 -1.64
sm01 -1.63 -1.64 -1.6 -1.6
sm02 -1.71 -1.62 -1.6 -1.6
sm03 -1.41 -1.42 -1.34 -1.35
Max -0.65 -0.66 -0.65 -0.65
Min -1.71 -1.69 -1.65 -1.65
Average -1.2325 -1.2275 -1.2025 -1.20417 M0: HF reconstruction without compensation
M1: M0 + noise floor correction
M2: HF reconstruction with tone/noise compensation M3: M2 + noise floor correction
ODG at bit rate 80kbps
-1.8 -1.6 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0
es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03
ODG
M0 M1 M2 M3
Figure 11: ODG comparison of MPEG test tracks at bit rate 80kbps.
26
Table 8: ODG for proposed methods on MPEG test tracks at bit rate 64 kbps.
Codec NCTU-HEAAC
Bit Rate 64 kbps
Tracks M0 M1 M2 M3
es01 -1.02 -1.03 -1 -1.01
es02 -0.89 -0.89 -0.87 -0.87
es03 -1.01 -1.03 -1.03 -1.03
sc01 -1.7 -1.7 -1.66 -1.66
sc02 -1.77 -1.78 -1.74 -1.74
sc03 -1.67 -1.68 -1.63 -1.63
si01 -2 -2 -2 -1.99
si02 -1.5 -1.5 -1.57 -1.57
si03 -2.16 -2.16 -2.09 -2.09
sm01 -2.18 -2.19 -2.17 -2.17
sm02 -2.45 -2.42 -2.51 -2.51
sm03 -1.8 -1.81 -1.73 -1.72
Max -0.89 -0.89 -0.87 -0.87
Min -2.45 -2.42 -2.51 -2.51
Average -1.67917 -1.6825 -1.66667 -1.66583 M0: HF reconstruction without compensation
M1: M0 + noise floor correction
M2: HF reconstruction with tone/noise compensation M3: M2 + noise floor correction
ODG at bit rate 64kbps
-3 -2.5 -2 -1.5 -1 -0.5 0
es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03
ODG
M0 M1 M2 M3
Figure 12: ODG comparison of MPEG test tracks at bit rate 64kbps.
Table 9: ODG for proposed methods on MPEG test tracks at bit rate 48 kbps
Codec NCTU-HEAAC
Bit Rate 48 kbps
Tracks M0 M1 M2 M3
es01 -1.58 -1.59 -1.56 -1.61
es02 -1.42 -1.4 -1.39 -1.42
es03 -1.71 -1.71 -1.71 -1.7
sc01 -2.46 -2.46 -2.4 -2.4
sc02 -2.64 -2.63 -2.59 -2.62
sc03 -2.41 -2.41 -2.33 -2.33
si01 -2.73 -2.73 -2.71 -2.74
si02 -2.4 -2.38 -2.48 -2.46
si03 -3.21 -3.21 -3.17 -3.17
sm01 -3.24 -3.24 -3.2 -3.2
sm02 -3.4 -3.41 -3.37 -3.38
sm03 -2.36 -2.37 -2.26 -2.26
Max -1.42 -1.4 -1.39 -1.42
Min -3.4 -3.41 -3.37 -3.38
Average -2.46333 -2.46167 -2.43083 -2.44083 M0: HF reconstruction without compensation
M1: M0 + noise floor correction
M2: HF reconstruction with tone/noise compensation M3: M2 + noise floor correction
ODG at bit rate 48kbps
-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0
es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03
ODG
M0 M1 M2 M3
Figure 13: ODG comparison of MPEG test tracks at bit rate 48kbps.
28
As the results shows, the quality of HF reconstruction in SBR audio is improved by proper tone/noise compensation at different bit rate, except es03, si02 and sm02. The subjective test will be conducted later to check if the quality is degraded or improved.
As the results shows, the quality of HF reconstruction in SBR audio is improved by proper tone/noise compensation at different bit rate, except es03, si02 and sm02. The subjective test will be conducted later to check if the quality is degraded or improved.