The Efficient Temporal - MPEG-4 AAC 中高效能TNS的設計

There are two problems associated with the detection mechanism. First, as illustrated in last section, the coding gain can not reflect the injection of the above three artifacts. Also, the switch mechanism based on the coding gain directly leads to computing overhead from the TNS filtering. This chapter presents a detection mechanism based on the perceptual entropies. Also, we propose the methods to handle the three artifacts. The method can leads to merits in both quality and complexity.

4.1 The Perceptual Entropy Switch Method

In order to resolve these disadvantages mentioned above, the efficient switch criterion through PE (Perceptual Entropy) is proposed in [17].

The PE is defined as:

where b is the index of the threshold calculation partition, BW is the number of the _b frequency lines in partition b, E is the sum of the energy in partition b and _b

Masking is the masking threshold in partition b. The masking threshold b Masking is _b defined as

max( qthr nb nb l repelev

Masking

=

_b _b _b ⁽³⁸⁾

whereqthr is the threshold in quiet, _b nb is the masking threshold of partition b, _b lb

nb _ is the threshold of partition b for the last block and rpelev is set to ‘1’ for short blocks and ‘2’ for long blocks. From (37) and (38), when the (N-1)^th signal is like quiet sound and the N^th signal is an attack signal, the Masking of the N_b ^th signal is the small value nb_l_b*repelev, not nb . Take an example, in Figure 20, the 1_b ^st frame is a quiet sound and the 2^nd frame is an attack signal. For the calculation of the 2^nd PE, the nb_l_b is much smaller than the nb_b and the corresponding PE is high. It means that the N^th input signal is an attack signal. However, the PE just detects the signal leading to the pre-echo phenomenon. In order to ease the post-echo phenomenon, the PE_b will be useful. When two consecutive frames have different contents , like an attack signal and a quiet sound, the masking value for each band

should be different. Therefore, once the previous PEb is much bigger than the current PEb, it means that the post-echo phenomenon will be happened. So, except the low frequency band, if one of the values that the previous PEb divides the current PEb is over a threshold, the signal should be applied with the TNS module. Besides, the PE value of each frame has been computed in the psychoacoustic model. To avoid computing the Levinson-Durbin method for each frame, an attack flag decided through the information of the PE and PEb in the psychoacoustic model is sent to the TNS module. Figure 19 illustrates the new flowchart of the TNS. Compared to Figure 8, the decision block “whether is coding gain bigger than the threshold” is replaced with the block “whether is the attack flag true”. If the flag is true, the Levinson-Durbin recursion will be computed. Obviously, the computation complexity is reduced a lot. However, by this way, if two attack signals appear in the two continues frame, the PE value of the second attack signal is not high enough and the signal is viewed as non-attack signal by the efficient switch method.

Figure 19: The TNS flowchart with The PE method.

Figure 20: An attack signal appears in two frames.

Since the overlapping property of MDCT windows, the attack signal will appear in two consecutive blocks as Figure 20. Both the 2^nd and 3^rd frames should be applied with the TNS module. Thus, to ensure the two consecutive blocks active with the TNS module, if the previous frame is detected as an attack signal, the current frame is applied with the TNS module.

4.2 Ease pre-aliasing and post-aliasing artifact

In Chapter 3, the reason of pre-aliasing and post-aliasing artifact has been discussed in detail. The more order is, the more apparent the aliasing artifact is. It will lead to the bad performance of the TNS module. Obviously, to solve the problem, from Figure 13 (c), if the values at the tail of the window are zero, after the multiplication, the post-aliasing at the 14^th and 15^th point will be disappeared in Figure 13 (d). Similarly, if the values in the front of the window are zero, the pre-aliasing artifact can be eased. However, the LONG_START and LONG_STOP window defined in AAC are suitable for the above requirement. Then, based on the above PE switch method, an improved method to ease the artifact is proposed. First, in order to ease the artifact, the most important thing is to identify the position of the attack signal detected by the above PE method. According to the position, TNS can choose a suitable window for a better coding. Therefore, the Algo 3 is designed to detect the position, which classifies a long window block into eight zones and the energy of each zone is calculated. Starting from the zone 2, if one energy ratio over a threshold which is the energy of the current zone divide the energy of the previous zone is found, the zone is call as the position of the attack signal. In Figure 21, for the 2^nd frame, the energy ratio for the zone 7 is over the threshold. So, the attack position is viewed as the zone 7. After detecting the position of the attack signal, the next step is to determine the suitable window to ease the aliasing artifact. If the position is between zone 5 and zone 8, the window of the current frame is set to the

LONG_START window and the next frame becomes the LONG_STOP window. In Figure 21, because of the attack position regarded as zone 7, the window of the 2^nd frame is set to the LONG_START window and the next frame is the LONG_STOP window. Otherwise, if the attack signal locates at the frame between the zone 1 and 4, it should be the LONG_START window and the previous frame is the LONG_STOP window, the disadvantage of which is that the additional frame delay is needed.

Therefore, for the efficiency, it retains the ONLY_LONG window for the attack position between zone 1 and 4. Finally, whether TNS is active or not, it depends on the attack flag, the window type and the attack position. It can be analysed as three conditions. One condition is that, if the window type is the LONG_START window and the attack position is at zone 5 and 6, TNS is active. But, if the attack position is at zone 7 and 8, it means that the current window doesn’t contain the attack signal.

For the next window, the attack position will be at zone 3 and 4. Therefore, the other condition is that if the window type is belong to the “LONG_STOP” window and the attack position is at zone 3 and 4, TNS is also active. Besides, the third condition is applying the TNS to the signal which the attack position is between zone 1 and 4. To reduce the pre-aliasing and post-aliasing, this condition should use less prediction order to shape the time domain noise.

Step 1. If attack flag is false, leave the algorithm Step 2. Divide a frame into 8 zones

Step 3. Calculate the energy for each zone

Step 4. return the first position i such that energy[i]/energy[i-1]

>TNS_SWITCH_RATIO, if exist

Algo 3: Detect Position algorithm.

Step 1. If the attack position is belong to the right half of the frame (i = 5,6,7,8) and the block type of the previous frame is ONLY_LONG, the block type is set as LONG_START

Step 2. Else if the block type of the previous frame is LONG_START, the block type is LONG_STOP.

Step 3. Else the block type is ONLY_LONG

Algo 4: Window switch algorithm.

Condition 1. the block type is LONG_START and the attack position is 5 or 6 Condition 2. the block type is LONG_STOP and the attack position is 3 or 4 Condition 3. (i)the block type is ONLY_LONG

(ii)the attack flag is true (iii)the attack position is 1~4

Step 1. if one of the above 3 conditions is satisfied, the TNS module is active.

Step 2. if condition3 is satisfied, the prediction order should be less Algo 5: TNS applied algorithm.

Figure 21: The position of an attack signal is at zone 7 for the 2^rd frame.

在文檔中 MPEG-4 AAC 中高效能TNS的設計 (頁 33-38)