M/S Coding and Window Switch - Joint Design with Other AAC Modules

Chapter 3 Design of Window Switch Method in AAC

3.4 Joint Design with Other AAC Modules

3.4.2 M/S Coding and Window Switch

In the stereo coding of AAC, M/S mechanism is applicable when both window type and the same grouping manner in the two stereo channels are the same. This subsection proposes the window coupling and group coupling method to have good coding efficiency under the constraint.

Window Coupling

When one channel is short windows type and another is long window type, we check the similarity of these two channels first. If they are similar, we have to decide using long or short window type simultaneously. The perceptual entropy (PE) can assist us to judge the similarity and window decision. Figure 15 illustrates the flowchart of window coupling. It shows that the difference of PEs, T1, is used to judge the similarity. Then, we set another PE threshold T2 to decide the window type.

Figure 15: Flowchart of window coupling method.

Group Coupling

As the grouping method discussed in section 3.3, we calculate the sum of scale factor error in both channel and group two channels simultaneously. In the left portion of Figure 16, the grouping method is used individually in two channels. The purpose of group coupling method is to keep the same the grouping manner in both channels as illustrated in Figure 16.

M E

_g_,_L

<

M E

_g_,_R

<

M E

E_g = _g_,_L + _g_,_R <2

Channel L

Channel R

Figure 16: Example of grouping individually and simultaneously.

The criterion of grouping method minimizes the grouping number, and the total

2M.

Figure 17: Flowchart of two coupling methods.

Figure 17 explains the relationship with the M/S coding. When the M/S is switched on, the energies of two channels will be modified and the scale factor associated with each scale factor band will be re-estimated. When the M/S doesn’t apply, the grouping can be applied individually to two stereo channels. The new NCTU_AAC flowchart is illustrated in Figure 18 and Figure 19.

Figure 18: NCTU_AAC block diagram without two coupling methods.

Figure 19: NCTU_AAC block diagram with two coupling methods.

Chapter 4 Experiments

This Chapter focuses on the quality measurement in NCTU_AAC platform.

There are five primary experimental aspects. The first one is for the window decision method. The second one is for grouping method. The third one is for two coupling methods. The fourth one is the report of 327 tracks test. The last one is for the comparison of objective quality with other AAC encoders.

Table 1: MPEG testing track set.

Signal Description Track

Signal Mode Time(sec) Remark

1 es01 vocal (Suzan Vega) Stereo 10 (c)

11 sm02 Glockenspiel Stereo 10 (a)

12 sm03 Plucked strings Stereo 13 Remark:

(a) Transients: pre-echo sensitive, smearing of noise in temporal domain.

(b) Tonal/Harmonic structure: noise sensitive, roughness.

distortion sensitive, smearing of attacks.

(d) Complex sound: stresses the Device Under Test.

(e) High bandwidth: stresses the Device Under Test, loss of high frequencies, program-modulated high frequency noise.

(f) Low volume testing.

All experiments in this paper are based on different psychoacoustic models, and the new M/S coding module is proposed by [20] and [21] respectively.

4.1 Experiments of Window Decision

As mentioned in the section 3.1, a new window decision is proposed. The new decision method consists of three kinds of information, energy ratio, zero-crossing ratio and tonal attack. This section explains that the energy threshold and the zero-crossing threshold will be firstly calibrated. Figure 20 is the energy threshold calibration based on zero-crossing threshold 5.0. After obtaining the energy threshold, Figure 21 shows the calibration of zero-crossing threshold. Then the energy threshold needs to calibrate again (Figure 22). Finally, the energy threshold is 6.0 and the zero-crossing threshold is 4.5.

BitRate=128Kbps, SampleRate=44.1kHz, base on ZeroCrossing Threshold Tz=5.0

-0.65 -0.65 -0.65 -0.68 -0.68

-0.950 -0.941 -0.938 -0.939 -0.944

-1.44 -1.44 -1.44 -1.44 -1.44

-1.5

Figure 20: ODG for different Energy Threshold based on the Zero-Crossing Threshold Tz=5.0. The horizontal line is the average ODG among all the tested tracks in Table 1. The best ODG and the worst ODG in the tested tracks are marked with the

triangle and “—” around the horizontal line.

BitRate=128Kbps, SampleRate=44.1kHz, base on Energy Threshold Te=6.0

-0.64 -0.63 -0.63 -0.65 -0.65

-0.938 -0.935 -0.935 -0.938 -0.943

-1.44 -1.44 -1.44 -1.44 -1.44

-1.5

Figure 21: ODG for different Zero-Crossing Threshold based on the Energy Threshold Te=6.0.

BitRate=128Kbps, SampleRate=44.1kHz, base on ZeroCrossing Ratio

Figure 22: ODG for different Energy Threshold based on the Zero-Crossing Threshold Tz=4.5.

After calibrating these thresholds, the new window decision method is finished.

Figure 23 and Table 2 are the comparison results of different window decision methods, showing that the new decision method is better than the other two methods, only long window method and PE decision method. The speech voice songs (e.g. es01, es02, and es03) and the attack songs (e.g. si02) have an outstanding improvement.

Bit Rate=128Kbps Sample Rate=44.1kHz

-3.5

NCTU_AAC with only Long window NCTU_AAC with PE decision method NCTU_AAC with new decision method

Figure 23: Objective test using P4 on the three decision methods: “NCTU-AAC with only Long Window”, “NCTU-AAC with PE decision method” and “NCTU-AAC

with new decision method”

Table 2: Detail ODG values of objective test on the three decision methods.

Psychoacoustic Model P1 P1 P1 P4 P4 P4

use Short No Yes Yes No Yes Yes

M/S L/R L/R L/R L/R L/R L/R

Coupling No No No No No No

es01 -1.78 -1.02 -0.9 -1.57 -0.87 -0.77

es02 -2.16 -2.03 -0.93 -2.03 -1.9 -0.89

es03 -2.52 -1.6 -0.88 -2.21 -1.37 -0.86

sc01 -0.81 -0.84 -0.88 -0.75 -0.87 -0.77

sc02 -1.06 -1.06 -1.06 -1.11 -1.15 -1.11

sc03 -0.86 -0.77 -0.78 -0.7 -0.64 -0.63

si01 -1.36 -1.05 -1.04 -1.16 -1.02 -0.94

si02 -3.45 -2.1 -0.91 -3.24 -1.93 -0.81

si03 -2.42 -2.43 -2.42 -1.29 -1.79 -1.27

sm01 -1.49 -1.49 -1.48 -0.9 -1.14 -0.9

sm02 -1.79 -1.97 -1.7 -1.54 -1.47 -1.44

sm03 -1.37 -0.72 -0.74 -1.37 -0.81 -0.83

Average -1.75583 -1.42333 -1.14333 -1.48917 -1.24667 -0.935 Bit Rate : 128kbps (CBR)

Sample Rate : 44100 Hz

P1 : ISO Standard Psychoacoustic Model 2 P4 : AM/GM Psychoacoustic Model

4.2 Experiments on Grouping Threshold

This section focuses on calibrating the grouping threshold M. The purpose of the grouping threshold is to control the scale factor errors in one group. If threshold M is large, the number of group decreases and it will enlarge the error of scale factor.

Conversely, if the threshold M is small, the number of group and the side information will increase simultaneously.

BitRate=128kbps, SampleRate=44.1kHz, 12 songs test

-0.64 -0.63 -0.63 -0.63 -0.63

-0.963 -0.945 -0.935 -0.936 -0.942

-1.44 -1.44 -1.44 -1.44 -1.45

-1.5

M=256 M=512 M=768 M=1024 M=1536

ODG

Max Avg Min

Figure 24: ODG for different Grouping Threshold based on the new window decision method.

Figure 24 summarizes the effect of the grouping threshold. Threshold around 768 has led to best quality.

4.3 Experiments on Coupling Method

Coupling keeps window types and grouping manners consistent in both stereo channels. Therefore, coupling methods play an important role in the join design between window switch and M/S coding. The window coupling method checks the similarity of two channels by a PE threshold T1. If the two channels are similar, another PE threshold T2 is used to re-decide the window type in both channels. The first part of this section is to calibrate these two thresholds, T1 and T2. Figure 25, Figure 26, and Figure 27 are the calibration of thresholds T1 and T2. As thresholds measurement in section 4.1, thresholds T1 and T2 are also measured repeatedly. After calibrating these thresholds, we measure the quality improvement and show it in the Figure 28 and Table 3.

BitRate=128Kbps, SampleRate=44.1kHz, base on T1=100

-0.19 -0.19 -0.19 -0.19 -0.2

-0.704 -0.704 -0.703 -0.704 -0.707

-1.26 -1.26 -1.26 -1.27 -1.29

-1.4

T2=2000 T2=2400 T2=2600 T2=2800 T2=3000

ODG

Max Avg Min

Figure 25: ODG for different PE Threshold T2 based on T1=100.

BitRate=128Kbps, SampleRate=44.1kHz, base on T2=2600

-0.19 -0.19 -0.19 -0.19 -0.19

-0.709 -0.703 -0.703 -0.703 -0.703

-1.26 -1.26 -1.26 -1.26 -1.26

-1.4

Figure 26: ODG for different PE Threshold T1 based on T2=2600.

BitRate=128Kbps, SampleRate=44.1kHz, T1=200

Figure 27: ODG for different PE Threshold T2 based on T1=200.

Bit Rate=128Kbps, Sample Rate=44.1kHz, with Short Window and M/S

NCTU_AAC without Coupling Method NCTU_AAC with Coupling Method

Figure 28: Objective test using P4 on the two methods: “NCTU_AAC without Coupling Method” and “NCTU_AAC with Coupling method”.

Table 3: Detail ODG values of objective test on using coupling methods or not.

Psychoacoustic

P1 : ISO Standard Psychoacoustic Model 2 P4 : AM/GM Psychoacoustic Model

Figure 28 and Table 3 are the experiment results of coupling method. Table 3 shows that whether in P1 or P4 psychoacoustic model, coupling methods can improve the quality.

4.4 327 Tracks Test

In order to measure window switch quality, a large number of test bitstreams are needed. In PSPLab audio database [22], there are 16 sets and 327 tracks.

For each bitstream set, they are briefly described in Table 4.

Table 4: The description for each bitstream set

Bitstreams categories Number of Tracks

Remark

ff123 103 Killer bitstream collection from ff123.

Gpsycho 24 LAME quality test bitstream.

HA64KTest 39 64 Kbps test bitstream for multi-format in HA forum.

HA128KTestV2 12 128 Kbps test bitstream for multi-format in HA forum.

horrible_song 16 Collections of killer songs among all

bitstream in PSPLab.

ingets1 5 Bitstream collection from the test of OGG

Vorbis pre 1.0 listening test.

Mono 3 Mono test bitstream.

MPEG 12 MPEG test bitstream set for 48KHz.

MPEG44100 12 MPEG test bitstream set for 44100 Hz.

Phong 8 Test bitstream collection from Phong.

PSPLab 37 Collections of bitstream from early age of

PSPLab. Some are good as killer.

Sjeng 3 Small bitstream collection by sjeng.

SQAM 16 Sound quality assessment material recordings

for subjective tests.

TestingSong14 14 Test bitstream collection from rshong.

TonalSignals 15 Artificial bitstream that contains sin wave etc.

VORBIS_TESTS_Samples 8

Total 327

BitRate=128Kbps

NCTU_AAC without short window NCTU_AAC with PE Window Decision NCTU_AAC with New Window Decision

Figure 29: For 16 bitstream sets, objective test on the three methods: “NCTU-AAC 1.0 without short window”, “NCTU-AAC 1.0 with PE Window Decision” and

“NCTU-AAC 1.0 with New Window Decision”.

Improvement tracks distribution

NCTU_AAC with PE Window Decision NCTU_AAC with New Window Decision

Figure 30: Dstribution of the improved tracks.

Digradation tracks distribution

NCTU_AAC with PE Window Decision NCTU_AAC with New Window Decision

Figure 29 illustrates the three experiments for the 16 bitstream sets, where each bar denotes the average ODG of each bitstream set. Generally, the new window decision method has a better quality than that of the PE decision method. In detail, for 327 tracks, Figure 30 illustrates the distribution of the improved tracks for the two different methods: window switch based on PE and window switch based on new method. The x-axis represents six different improvement ranges and the y-axis means the number of tracks improved. The different range is the difference between methods using and not using short window. It is clear that new window decision method has better improvement than that of the method based on PE in quantity and quality.

Besides, Figure 31 illustrates the distribution of the degraded tracks distribution for the two different methods like Figure 30. Window decision method is still better than PE decision in the degradation tracks distribution. Most of the degradation tracks degrade below -0.05. There are only 7 tracks degrading by ODG value larger than 0.1.

The reason causing the bad quality is that the tone in low frequency can’t be precisely detected. Therefore, bad frequency resolution in short window induces bad encoded quality.

4.5 Experiments of Quality Comparison

In this section, we compare NCTU_AAC 1.0 and other two commercial AAC encoders, QuickTime [23] and Nero [24]. The experiments in this section focus on the quality comparison. The complexity comparison can be referred to [20].

BitRate=128Kbps, SampleRate=44.1kHz

Nero 6.3 QuickTime 6.3 NCTU_AAC 1.0

Figure 32: Objective test on the three encoders: “Nero 6.3”, “QuickTime 6.3” and

“NCTU-AAC 1.0”.

Figure 32 illustrates three experiments for MPEG 12 tracks. Each bar denotes the ODG value of each track. All tracks encoded by NCTU_AAC 1.0 are better than that encoded by Nero 6.3. NCTU_AAC 1.0 has better encoding quality than QuickTime 6.3 in 7 tracks. In average, NCTU_AAC 1.0 performs better than other two encoders.

Table 5 shows the detail result of the three encoders.

Table 5: Detail ODG result of quality comparison of three encoders.

Nero 6.3 QuickTime 6.3

NCTU_AAC 1.0

es01 -0.6 -0.32 -0.27

es02 -0.45 -0.11 -0.15

es03 -0.51 0.02 -0.23

sc01 -0.88 -0.22 -0.45

sc02 -1.38 -0.84 -0.66

sc03 -0.84 -0.64 -0.4

si01 -1.32 -0.71 -0.62

si02 -0.82 -0.72 -0.54

si03 -1.59 -0.78 -0.98

sm01 -1.36 -0.75 -0.61

sm02 -0.72 -0.37 -0.53

sm03 -1.29 -0.73 -0.62

Average -0.98 -0.51417 -0.505 Bit Rate : 128kbps

Sample Rate : 44100 Hz

NCTU_AAC uses AM/GM Psychoacoustic Model, Window Switch, TNS, M/S, and Bit Reservoir.

Chapter 5 Conclusion

This thesis has proposed a new window switch method to improve the quality.

Firstly, window decision based on energy ratio, zero-crossing ratio, and tonal attack has been proposed. Secondly, short window psychoacoustic model is replaced by long window psychoacoustic model, and it aligns the masking by short window energy.

Thirdly, grouping method has been proposed. Then, for the combination with TNS, the window type switch algorithm has been modified. Finally, for M/S coding, coupling methods on groups and window types has been proposed.

References

[1] ISO/IEC, “Coding of moving pictures and audio –IS 13818-7 (MPEG-2 advanced audio coding, AAC)”, Doc. ISO/IEC JTCI/SC29/WG11 n1650, Apr. 1997.

[2] ISO/IEC, “Information technology- coding of audiovisual objects”— ISO/IEC.D 4496 (Part 3, Audio), 1999.

[3] P. Masri and A. Bateman, “Improved modeling of attack transients in music analysis-resynthesis,” ICMC, 1996.

[4] J. Kliewer and A. Mertins, “Audio subband coding with improved representation of transient signal segments,” EUSIPCO-98, Sept. 1998.

[5] R. Vafin, R. Heusdens, S. Par and B. Kleijn, ”Improved modeling of audio signals by modifying transient locations,” IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2001.

[6] Z. Hou, W. Dou and Z. Dong, “New window-switching criterion of audio compression,” Multimedia Signal Processing, 2001 IEEE Fourth Workshop on, 3-5 Oct. 2001.

[7] S. B. Venkata, K. M. Ashich, V. M. Vijayachandran and M. K. Vinay, ”Transient detection for transform domain coders,” AES 116th Convention, Germany, 8-11 May 2004.

[8] E. Zwicker and H. Fastl, “Psychoacoustics: facts and models,” Springer-Verlag, Berlin Heidelberg, 1990.

[9] K. Brandenburg and J. Johnston, “Second generation perceptual audio coding: the hybrid coder,” AES 88th Convention, Montreux, 13-16 Mar. 1990.

[10] Y. Mahieux and J. P. Petit, “High-quality audio transform coding at 64 kbps,” IEEE Transactions on Communications, Vol. 42, pp. 3010-3019, Nov. 1994.

[11] M. Bosi, K. Brandenburg, S. Quackenbush, L. Fielder, K. Akagiri, H. Fuchs, M.

Dietz, J. Herre, G. Davidson and Y. Oikawa, “ISO/IEC MPEG-2 advanced audio coding,” Journal of AES, Vol 45, no. 10, pp 789-814, October 1997.

[12] J. D. Johnston, “Transform coding of audio signals using perceptual noise criteria,”

IEEE Journal on Selected Area in Communications. Vol. 6, No. 2, Feb. 1998.

[13] J. Herre and J. D. Johnston, “Enhancing the performance of perceptual audio coders by using temporal noise shaping (TNS),” AES 101st Convention, Los Angeles, 8-11 Nov. 1996.

[14] K. Brandenburg and G. Stoll, “The ISO/MPEG-cudio codec: a generic standard for coding of high quality digital audio,” AES 92nd Convention, Vienna, 24-27 Mar.

1992.

[15] C. M. Liu, W. J. Lee and R. S. Hong, “A new criterion and associated bit allocation method for current audio coding standards,” Proceedings of the 5th international Conference on Digital Audio Effects (DAFX), 2002.

[16] Chu-Ting Chien, “Bit allocation for MPEG-4 advanced audio coding,” CSIE Master Thesis of NCTU, 2003.

[17] A. Dueñas, R.Pérez, B. Rivas, E. Alexandre and A. Pena, “A robust and efficient implementation of MPEG-2/4 AAC natural audio coders,” AES 112th Convention, Munich, 10-13 May 2002.

[18] NCTU_AAC,

website http://psplab.csie.nctu.edu.tw/projects/index.pl/nctu-aac.html .

[19] Tzu-Wen Chang, “Efficient temporal noise shaping for MPEG 4 advanced audio coding,” CSIE Master Thesis of NCTU, 2004.

[20] Ting Chiou, “Efficient psychoacoustic model for MPEG-4 audio coding based on filterbank,” CSIE Master Thesis of NCTU, 2004.

[21] Yo-Hua Hsiao, “M/S coding enhancement in MP3 and AAC,” CSIE Master Thesis of NCTU, 2004.

[22] PSPLab audio database

website http://psplab.csie.nctu.edu.tw/projects/index.pl/testbitstreams.html . [23] Apple, QuickTime,

website http://www.apple.com/quicktime/.

[24] Nero,

website http://www.nero.com/ .

在文檔中 AAC中訊窗轉換方法之設計 (頁 30-0)