Frequency Energy Method - Related Works - AAC中訊窗轉換方法之設計

Chapter 2 Backgrounds

2.3 Related Works

2.3.3 Frequency Energy Method

In order to detect signal change both in time domain and frequency domain, [7]

has proposed a transient detection algorithm which is operated in the spectral domain to capture both time domain and frequency domain transients. Each band energy en(b) is calculated. Consequently, band energy function of window n ( fn(b) ) is defined as,

(

)

⁽ ⁾

For each band b, a transient measurement G(b) is define as,

)

When the transient measurement G(b) is greater than a threshold T, the transient band flag d(b) will be set as 1. The total number of transient band F is calculated as,

∑

⁻

This method detecting the change of each band can detect the transient in both time domain and frequency domain. However, the complexity of transforming signals from time domain into frequency domain for each short window is higher than the other two methods. Besides, the characteristics of each frequency are different.

Accordingly, the transient measurement thresholds for each band are hard to decide.

Chapter 3 Design of Window Switch Method in AAC

In Chapter 2, we have introduced the pre-echo phenomenon and window switch mechanisms in AAC. Besides, the related works of window decision are discussed. In this Chapter, an integrated design of window switch in AAC will be proposed. The window decision is the most important and also the first issue in this Chapter. Then the method of omitting the psychoacoustic model of short window to reduce the complexity will be proposed. The third issue is the design of grouping method which can lead an appropriated bit allocation to reach a well quality. The last issue is the design of combination with TNS and M/S coding module in AAC.

3.1 Window Decision

The design of window decision is the most important part of window switch.

Because the short window has a higher time resolution, and the long window has a higher frequency resolution. The transient signal needs short windows to control the pre-echo effect and the stationary signal needs long window to resolve the lines in the signal spectrum in order to extract the redundancy. If the transient signal uses long window, the pre-echo phenomenon will happen. If the stationary signal uses short window, the low frequency resolution will make the encoded signal not precise enough in the frequency domain.

This section proposes a design of window decision by three kinds of information:

the global energy ratio, the zero-crossing ratio and the tonal attack. Window decision decides the window type of next frame. After deciding next window type, the current window type will be switched by comparing with next and prior window type.

Therefore, in the last subsection, the window type switch method will also be discussed.

3.1.1 Global Energy Ratio

Transient signals usually occur when the time domain energy has rapid change.

Therefore, the energy ratio is a kind of important information to detect transient signal.

Traditionally, the energy ratio detection method [4] only considers the energy ratio between two sliding short windows. Generally, the pre-echo effect is generated by the signal with global max energy. But, the energy ratio between two sliding windows will ignore the gradually increasing signal. Figure 6 is an example of speech signal.

Figure 6 (a) represents a transient signal which is increasing gradually. Figure 6 (b) is the value of traditional energy ratio. The max energy ratio in Figure 6 (b) is about 2.1.

However, if the transient threshold is set at 2, the misjudgment will happen easily.

Figure 6 (c) illustrates the variation of global max ratio. The global energy ratio method can provide a noticeable value of ratio and overcome the problem in traditional energy ratio method.

Figure 6: (a) Transient signal segment, (b) energy ratio of two sliding short windows, (c) values of global energy ratio.

In common with the traditional energy ratio method, we calculate a energy

Then the maximum energy Max_En and minimum energy Min_En in a set of short windows’ energy En(i) are found. The global energy ratio is defined as,

When the Global_En_Ratio is greater than a threshold Te, the signal is regard as a transient signal. The implement of this method is as easy as the traditional energy ratio method. However, this method is more general and it also can prevent the post-echo problem.

3.1.2 Zero-Crossing Ratio

As traditional energy ratio method, the global energy ratio can’t detect the signal which has segments with rapid changes in spectral content. However, zero-crossing rates can represent the main frequency content of signal. Figure 7 shows a transient signal with stable global energy ratio, but this signal has rapid change in spectral content. Zero-crossing ratio can detect this kind of transient signal.

Figure 7: A transient signal with rapid changes in spectral content.

The zero-crossing rate of each window is defined as,

256

Then the maximum zero-crossing rate Max_Ze and minimum zero-crossing rate Min_Ze in a set of short windows’ zero-crossing rate are found. The zero-crossing ratio is defined as,

When the Ze_Ratio is greater than a threshold Tz, the signal is regard as a transient signal. This method has lower complexity than the method introduced in subsection 2.3.3. This method can detect the transient in violin and speech signal.

3.1.3 Tonal Attack

The short window has lower frequency resolution than that of the long window.

Figure 8 (a) is an example of pure tone signal, and this signal will be regard as a transient signal by the global energy ratio. In Figure 8 (c), transforming the tonal signal by a shorter transform will make the side band energy increase. We define the tonal attack effect when the signal has a tonal band which is analyzed by the psychoacoustic model of long window. In other words, if there is a band with tonality greater than a threshold T, the encoder doesn’t use short windows in this frame to keep the frequency resolution.

Figure 8: (a) A pure tone signal, (b) the frequency transformed by 2048-sample transform, (c) the frequency transformed by 256-sample transform.

Window decision method is composed of above three kinds of information.

Figure 9 illustrates the window decision execution which uses global energy ratio and zero-crossing ratio to detect transient signal and then avoid the erroneous detection of

Figure 9: Window Decision Flowchart.

3.1.4 Window Type Switch Method

The start window should be used to bridge long and short window types. But window decision only decides the usage of long or short window type. Therefore, window decision should decide the window type of next frame in advance. The switch of current window type considers both prior and next window types. If the next frame is different from the prior one, the current frame must switch to the start or stop window type.

Figure 10: Window type switch analysis table and algorithm.

Figure 10 shows the analysis of all possible situations of the window type switch.

Long window, short windows, start window and stop window are represented by L, S, L_S, and S_L respectively. By removing some impossible situations, we can get a simple switching algorithm.

3.2 Psychoacoustic Model of Short Windows

By [13], one major disadvantage of the adaptive window switching technique is that it introduces additional complexity into coder. Since different window sizes require different interpretations and normalizations of the psychoacoustic model. In AAC, if the window sequence is composed of eight short windows, the coder needs to execute short window psychoacoustic model for eight times. Long window psychoacoustic model has higher frequency resolution and more precise masking threshold information than that of short window psychoacoustic model. Thus, this section presents a method to omit the execution of short window psychoacoustic model and replaces it by long window psychoacoustic model.

The psychoacoustic model calculates the minimum masking threshold which is necessary to determine the just-noticeable noise-level for each band in the filterbank [14]. The signal-to-masking ratio (SMR) is used in the bit allocation to determine the actual quantizer level in each subband of the block. Because the short window psychoacoustic is omitted, the band SMR of short window should be estimated from the mapping bands of long window.

Figure 11: Band SMR mapping example from long window to short window in AAC when the sample rate is 44.1kHz.

Figure 11 is an example which shows the mapping result of the 49 bands in the long window corresponding to the 14 bands in the short window. If the frame uses short window type, it needs to take SMRs from long window. Therefore, each band of the short window takes the maximum SMR of mapped bands of long window as its SMR. Selecting the maximum SMR is in order to reduce the risk of quantization error.

3.3 Window Grouping of Short Windows

In subsection 2.2.2, grouping and interleaving method have been introduced. The short window can handle the transient signal well by controlling the spreading of quantization noise to be within the short windows. However, when the AAC coder decides to use short windows, the total number of scale factor bands is twice larger than when it uses only one long window. By the grouping method, the short windows in the same group share the same scale factors. Therefore, this method reduces the total number of scale factor bands. Every time when a group is added, it would increase a set of scale factors. The more scale factor bands exists, the more bits that side information needs. Therefore, it would make inadequate bits exist and then problems of quantization errors are produced. Every time when one group is decreased, short windows which sharing the same scale factors increase. And the sum of errors between shared scale factors and estimated scale factors of each short window also increases. For the reasons given above, this section proposes a design of grouping method to improve the quality and the computing complexity.

It is intuitive to design the grouping method by using the estimated scale factors of eight short windows. Therefore, if the scale factors can be estimated earlier in the encoder the grouping method can be applied more flexible with other codec module (e.g. like M/S coding). In the following subsections, this thesis attempts to introduce an efficient estimation method of scale factors and then present the grouping method by these estimated scale factors.

3.3.1 Scale Factor Estimation

[15] and [16] present a noise estimation method. The expectation of quantization error of the non-uniform quantizer ei is

]

where g is global gain independent of the scale factor band q. c_q is scaling factor in each scale factor band.

Scale factor estimation of bit allocation is based on the bandwidth proportional noise-shaping criterion. From [17], the noise level for the scale factor bands should be proportional to the effective bandwidthB(q).

)

where and are the noise energy and the masking energy associated with the scale factor band q.

2 ) (q

σN σ_M²_(q₎

With (12) relating the scale factor with the noise power, it is straightforward to

combine (12) and (14). Let and define . The

expectation of the quantization error for bit allocation is

The difference between global gain and scale factor can be evaluated by

⎟⎟⎠

From (17), the global gain can be evaluated from

{

}

q g c

Max

g= − (18)

and the scale factors for all sub-bands are obtained.

3.3.2 Grouping Method

Since that the short windows in the same group share scale factors among all scale factor bands within the group, the differences between the shared scale factors (sharesfbg,b) and estimated scale factors (sfb,w) of short windows in the same group should be bounded. In addition to the difference of scale factors, the influence of this difference is proportional to the bandwidth (bandwidthb). So, the scale factor error of group g can be estimated by the following equation.

∑∑

∈

−

b w g

b b

g w

g sf sharedsf bandwidth

E _, _, (19)

The criterion of grouping method minimizes the grouping number, and the scale factor error Eg of each group should be smaller than a threshold M. By the criterion, we design an algorithm and display it in the Figure 12. Firstly, scale factors estimation will be executed. After that, the grouping method starts with the first short window.

Because short windows in one group should be continuous, every short window in the beginning tries to join the group that the last short window belongs to. If the scale factor error of the new group is smaller than threshold M, joining the short window will be successful; or creating a new group for the short window.

Figure 12: The flowchart of grouping method of short windows.

3.4 Joint Design with Other AAC Modules

After the filterbank, the design of each encoding module should consider two modes, long window mode and short window mode. In AAC, if short window type is used, the scale factor bands will be determined after grouping method. Therefore, long window and short windows can use the same bit allocation policy. Between the filterbank and bit allocation in the Figure 2, there are four modules: TNS, intensity coding, prediction and M/S coding. But, in the NCTU_AAC [18], there are only TNS and M/S coding modules. This section explains the relationship between window switch and these two coding modules.

3.4.1 TNS and Window Switch

TNS is also a technique to prevent the pre-echo phenomena. Consequently, TNS also has problems to decide whether the signal is transient. In [19], TNS is applied according to the perceptual entropy and the location of attacks. In addition, section 3.1 describes that window decision is decided by three kinds of information. Therefore, the decision methods of TNS and window switch are not the same; TNS and window switch can work together and achieve better quality.

Figure 13: Window type switch when TNS is applied and attempts to ease aliasing.

Figure 14: The modified window type switch algorithm.

Chang’s method [19] has proposed a window switch among start, stop, and long windows to ease aliasing. Figure 13 illustrates that if the current window type is long window, it will be switched to the start window when TNS is applied. In the next time (n+1), the new situation (when prior window type is start, current window type is long, and next window type is also long) should be considered. Figure 14 is the modification of the window type switch in subsection 3.1.4. Accordingly, Figure 14 considers the new situation caused by TNS.

3.4.2 M/S Coding and Window Switch

In the stereo coding of AAC, M/S mechanism is applicable when both window type and the same grouping manner in the two stereo channels are the same. This subsection proposes the window coupling and group coupling method to have good coding efficiency under the constraint.

Window Coupling

When one channel is short windows type and another is long window type, we check the similarity of these two channels first. If they are similar, we have to decide using long or short window type simultaneously. The perceptual entropy (PE) can assist us to judge the similarity and window decision. Figure 15 illustrates the flowchart of window coupling. It shows that the difference of PEs, T1, is used to judge the similarity. Then, we set another PE threshold T2 to decide the window type.

Figure 15: Flowchart of window coupling method.

Group Coupling

As the grouping method discussed in section 3.3, we calculate the sum of scale factor error in both channel and group two channels simultaneously. In the left portion of Figure 16, the grouping method is used individually in two channels. The purpose of group coupling method is to keep the same the grouping manner in both channels as illustrated in Figure 16.

M E

_g_,_L

<

M E

_g_,_R

<

M E

E_g = _g_,_L + _g_,_R <2

Channel L

Channel R

Figure 16: Example of grouping individually and simultaneously.

The criterion of grouping method minimizes the grouping number, and the total

2M.

Figure 17: Flowchart of two coupling methods.

Figure 17 explains the relationship with the M/S coding. When the M/S is switched on, the energies of two channels will be modified and the scale factor associated with each scale factor band will be re-estimated. When the M/S doesn’t apply, the grouping can be applied individually to two stereo channels. The new NCTU_AAC flowchart is illustrated in Figure 18 and Figure 19.

Figure 18: NCTU_AAC block diagram without two coupling methods.

Figure 19: NCTU_AAC block diagram with two coupling methods.

Chapter 4 Experiments

This Chapter focuses on the quality measurement in NCTU_AAC platform.

There are five primary experimental aspects. The first one is for the window decision method. The second one is for grouping method. The third one is for two coupling methods. The fourth one is the report of 327 tracks test. The last one is for the comparison of objective quality with other AAC encoders.

Table 1: MPEG testing track set.

Signal Description Track

Signal Mode Time(sec) Remark

1 es01 vocal (Suzan Vega) Stereo 10 (c)

11 sm02 Glockenspiel Stereo 10 (a)

12 sm03 Plucked strings Stereo 13 Remark:

(a) Transients: pre-echo sensitive, smearing of noise in temporal domain.

(b) Tonal/Harmonic structure: noise sensitive, roughness.

distortion sensitive, smearing of attacks.

(d) Complex sound: stresses the Device Under Test.

(e) High bandwidth: stresses the Device Under Test, loss of high frequencies, program-modulated high frequency noise.

(f) Low volume testing.

All experiments in this paper are based on different psychoacoustic models, and the new M/S coding module is proposed by [20] and [21] respectively.

4.1 Experiments of Window Decision

As mentioned in the section 3.1, a new window decision is proposed. The new decision method consists of three kinds of information, energy ratio, zero-crossing ratio and tonal attack. This section explains that the energy threshold and the zero-crossing threshold will be firstly calibrated. Figure 20 is the energy threshold calibration based on zero-crossing threshold 5.0. After obtaining the energy threshold, Figure 21 shows the calibration of zero-crossing threshold. Then the energy threshold needs to calibrate again (Figure 22). Finally, the energy threshold is 6.0 and the zero-crossing threshold is 4.5.

BitRate=128Kbps, SampleRate=44.1kHz, base on ZeroCrossing Threshold Tz=5.0

-0.65 -0.65 -0.65 -0.68 -0.68

-0.950 -0.941 -0.938 -0.939 -0.944

-1.44 -1.44 -1.44 -1.44 -1.44

-1.5

Figure 20: ODG for different Energy Threshold based on the Zero-Crossing Threshold Tz=5.0. The horizontal line is the average ODG among all the tested tracks in Table 1. The best ODG and the worst ODG in the tested tracks are marked with the

triangle and “—” around the horizontal line.

BitRate=128Kbps, SampleRate=44.1kHz, base on Energy Threshold Te=6.0

-0.64 -0.63 -0.63 -0.65 -0.65

-0.938 -0.935 -0.935 -0.938 -0.943

-1.44 -1.44 -1.44 -1.44 -1.44

-1.5

Figure 21: ODG for different Zero-Crossing Threshold based on the Energy Threshold Te=6.0.

BitRate=128Kbps, SampleRate=44.1kHz, base on ZeroCrossing Ratio

Figure 22: ODG for different Energy Threshold based on the Zero-Crossing Threshold Tz=4.5.

After calibrating these thresholds, the new window decision method is finished.

Figure 23 and Table 2 are the comparison results of different window decision methods, showing that the new decision method is better than the other two methods, only long window method and PE decision method. The speech voice songs (e.g. es01, es02, and es03) and the attack songs (e.g. si02) have an outstanding improvement.

Bit Rate=128Kbps Sample Rate=44.1kHz

-3.5

NCTU_AAC with only Long window NCTU_AAC with PE decision method NCTU_AAC with new decision method

Figure 23: Objective test using P4 on the three decision methods: “NCTU-AAC with only Long Window”, “NCTU-AAC with PE decision method” and “NCTU-AAC

with new decision method”

Table 2: Detail ODG values of objective test on the three decision methods.

Psychoacoustic Model P1 P1 P1 P4 P4 P4

use Short No Yes Yes No Yes Yes

M/S L/R L/R L/R L/R L/R L/R

Coupling No No No No No No

es01 -1.78 -1.02 -0.9 -1.57 -0.87 -0.77

es02 -2.16 -2.03 -0.93 -2.03 -1.9 -0.89

es03 -2.52 -1.6 -0.88 -2.21 -1.37 -0.86

sc01 -0.81 -0.84 -0.88 -0.75 -0.87 -0.77

sc02 -1.06 -1.06 -1.06 -1.11 -1.15 -1.11

sc03 -0.86 -0.77 -0.78 -0.7 -0.64 -0.63

si01 -1.36 -1.05 -1.04 -1.16 -1.02 -0.94

si02 -3.45 -2.1 -0.91 -3.24 -1.93 -0.81

si03 -2.42 -2.43 -2.42 -1.29 -1.79 -1.27

sm01 -1.49 -1.49 -1.48 -0.9 -1.14 -0.9

sm02 -1.79 -1.97 -1.7 -1.54 -1.47 -1.44

sm03 -1.37 -0.72 -0.74 -1.37 -0.81 -0.83

Average -1.75583 -1.42333 -1.14333 -1.48917 -1.24667 -0.935 Bit Rate : 128kbps (CBR)

Sample Rate : 44100 Hz

P1 : ISO Standard Psychoacoustic Model 2 P4 : AM/GM Psychoacoustic Model

4.2 Experiments on Grouping Threshold

This section focuses on calibrating the grouping threshold M. The purpose of the

在文檔中 AAC中訊窗轉換方法之設計 (頁 19-0)