Experiments - Efficient Psychoacoustic Model

Chapter 4 Efficient Psychoacoustic Model

4.3 Experiments

The experiments can be separated into two parts according to the foregoing sections.

First, the efficiency is to test the computation time of psychoacoustic model and the encoding time in different psychoacoustic models. Second, the quality test is always the critical issue in audio research on the experiments to prove the quality improvement. This thesis conducts the experiments on three hundred critical tracks and checks the possible risk through the Objective Difference Grade (ODG) developed by Recommendation ITU-R BS.1387 [27] in addition to the subjective measure. The result of ODG ranges from 0 to -4, where the value 0 corresponds to an imperceptible degradation and -4 to a degradation judged as very annoying. The result value is negative, because the quality of the Signal Under Test (SUT) is assumed to be worse than Reference Signal (RS). Also, the new efficient model has been extensively tested on the various coding combination like M/S coding, TNS coding, and bit rates. In the following test results, we use P4 representing the proposed efficient psychoacoustic model, and P1 representing the conventional psychoacoustic model II in Chapter 2.

First, we use a general performance testing tool Intel vTune 7.0 to test the psychoacoustic computational time.

Table 2: The psychoacoustic computational time in NCTU-AAC.

1 2 3 4 5 Average Speedup (%)

P1 30.240 29.660 29.750 29.960 27.750 29.472 72.58

P4 8.570 8.940 8.000 7.310 7.590 8.082

The Table 2 is running 5 times each different psychoacoustic model incorporating with other encoding components. Obviously, the proposed model can speed up the coding efficiency 72% more than P1. In conclusion, the proposed model can dramatically improve the coding efficiency.

The encoding time is shown in Table 3 testing in NCTU-AAC.

Table 3: Encoding time for NCTU-AAC.

NCTU-AAC Encoding time

(s) Speedup for P4

FileName Length P1 P4 Percentage (%)

es01 02:51 26 19 26.92

es02 02:17 19 14 26.32

es03 04:03 36 27 25.00

sc01 02:55 22 18 18.18

sc02 03:23 28 23 17.86

sc03 03:04 27 23 14.81

si01 04:47 39 36 7.69

si02 03:05 30 26 13.33

si03 05:34 49 45 8.16

sm01 04:27 38 35 7.89

sm02 02:01 18 16 11.11

sm03 04:11 38 34 10.53

Average 30.83 26.33 14.59

This proposed model can speed up the total encoding time by 14.59% compared with that based on P1 model. Moreover, Table 4 summarizes the encoding time using the different model incorporating M/S coding, window switching, TNS coding, and Bit Reservoir.

Table 4: The encoding time of encoder incorporating M/S coding, window switching, TNS coding, and bit reservoir.

NCTU-AAC Encoding time

(s) Speedup for P4

FileName Length P1 P4 Percentage (%)

es01 02:51 23 17 26.09

es02 02:17 14 10 28.57

es03 04:03 27 19 29.63

sc01 02:55 23 19 17.39

sc02 03:23 29 24 17.24

sc03 03:04 28 25 10.71

si01 04:47 42 38 9.52

si02 03:05 29 25 13.79

si03 05:34 54 50 7.41

sm01 04:27 42 37 11.90

sm02 02:01 18 16 11.11

sm03 04:11 42 37 11.90

Average 30.92 26.42 14.56

The proposed psychoacoustic model also provides a complexity gains by 14.56% for P1.

Figure 24 shows the NCTU-AAC in P4 provides a complexity gains by 20% for QuickTime 6.3 [28] and 18.37% for Nero 6 [29].

Encoding Time in different coders

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00

1 2 3

Times

(s) QT

Nero

NCTU-AAC in P4

Figure 24: Illustration of the encoding time in different coders.

Second, testing tracks are most generally used to test the audio coding quality is the MPEG44100 set bitstream is the 44100 Hz version of MPEG set bitstream.

Table 5: MPEG12 44100 Test songs.

Track Time Signal description

1 10 es01 vocal (Suzan Vega) Speech signal

2 8 es02 German speech 3 7 es03 English speech 4 10 sc01 Trumpet solo and orchestra 5 12 sc02 Orchestral piece

6 11 sc03 Contemporary pop music Complex sound mixtures

7 7 si01 Harpsichord

8 7 si02 Castanets

9 27 si03 pitch pipe Single instruments

10 11 sm01 Bagpipes

11 10 sm02 Glockenspiel

12 13 sm03 Plucked strings Simple sound mixtures First, at 128kbps bit rate the quality test result is shown in Figure 25:

-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

P1 P4

Figure 25: ODG at 128 kbps.

The P4 can get better quality than the conventional psychoacoustic models in the speech signal and single instrument and simple sound mixtures. Nevertheless, in the complex sound mixtures the ODG quality is equal to the P1 model. Second, at bit rate 112kbps result is Figure 26.

-4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

P1 P4

Figure 26: ODG at 112k.

The same result as that at 128kbps, P4 at 112kbps is much better than another model especially in speech signals and single instrument environments. At low bit rate 96k is as below Figure 27, the total quality degrades seriously in the low bit rates 96k but P4 obtains better quality than others even in complex sound mixtures.

-4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

es01 es02 es03 sc01 sc02 sc03 si01 si02 si03 sm01 sm02 sm03

P1 P4

Figure 27: ODG at 96k.

The average, best, and worst of the above tests is shown in Figure 28. Consequently, P4 in different bit rate can also obtain the better grades. Moreover, P4 can enhance the quality 0.30 than P1 in the 112k bit rate.

-0.81 -0.70

-1.18 -1.10

-1.80 -1.60

-3.45 -3.24

-3.68 -3.57

-3.83 -3.78

-1.76

-1.49

-2.43

-2.13

-3.18 -2.97

-4.50 -4.00 -3.50 -3.00 -2.50 -2.00 -1.50 -1.00 -0.50 0.00

P1-128K P4-128K P1-112K P4-112K P1-96K P4-96K

Best Worst Avg

Figure 28: Illustration of the results in different bit rate in different model.

Besides the above MPEG 12 songs, we also test the three hundred critical tracks as below Table 6.

Table 6: Three hundred critical tracks.

Categories Remark

1 ff123 103 Killer bitstream collection from ff123

2 gpsycho 24 LAME quality test bitstream collection

3 HA128KTestV2 12 64 Kbps test bitstream for multi-format in HA forum

4 HA64KTest 39 128 Kbps test bitstream for multi-format in HA forum

5 horrible_song 16 Collections of killer songs among all bitstream in PSPLab

6 ingets1 5 Bitstream collection from the test of OGG Vorbis pre 1.0 listening test

7 Mono 3 Mono test bitstream

8 MPEG 12 MPEG test bitstream set for 48KHz

9 MPEG44100 12 MPEG test bitstream set for 44100 Hz

10 Phong 8 Test bistream collection from Phong

11 PSPLab 37 Collections of bitstream from early age of PSPLab. Some are good as killer.

12 sjeng 3 Small bitstream collection by sjeng

13 SQAM 16 Sound quality assessment material recordings for subjective tests

14 TestingSong14 14 Test bitstream collection from rshong

15 TonalSignals 15 Artificial bitstream that contain sin wave etc speciailly made bitstream to probe quality of encoder

16 VORBIS_TESTS 8

The different psychoacoustic models are tested for above tracks in NCTU-AAC.

-3.5 -3 -2.5 -2 -1.5 -1 -0.5 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

P1 P4

Figure 29: Different psychoacoustic models in three hundred critical tracks.

The proposed model in whole averages ODG can gain 0.04 than P1.

Chapter 5 Psychoacoustic Model based on Energy

在文檔中 MPEG-4 AAC與MP3的聽覺感官模型設計 (頁 40-47)