Audio Player - Audio Decoding and Composition

Overview of the RTP

4.5 Audio Decoding and Composition

4.5.2 Audio Player

Figure 4.21 shows the declaration of audio MCI class. Figure 4.22 shows the opening of the audio data and Fig. 4.23 shows the playing of the audio data.

The most important MCI function is mciSenCommand. The prototype of mciSend-Command is “MCIERROR mciSendmciSend-Command (MCIDEVICEID IDDEVICE, UINT uMsg, DWORD fdwCommand, DWORD PTR dwParam).” The following paragraph are mainly taken from [19].

Figure 4.21: Related code of the audio declaration.

The first parameter, IDDevice, is the device identifier of the MCI device that is to receive the command message. The second parameter, uMsg, is the command message.

The third parameter, fdwCommand, is the flag of the command message. The last pa-rameter, dwParam, is a pointer to a structure that contain parameters for the command message. This function would return zero if successful or an error otherwise. The low-order word of the returned DWORD value contains the error return value. If the error is device-specific, the high-order word of the return value is the driver identifier; otherwise, the high-order word is zero.

After FAAD2 audio decoder decodes the audio stream received from the RTP network interface, it would be saved as a temporal audio file with WAV format. As we can see, we use the pointer, MCI OPEN PARMS, to set the device opening the audio WAV file.

The pointer, MCI PLAY PARMS, lets the mciSendCommand function send a message to device to play the audio file. Then the program would wait for four seconds and the device would play in the time. Finally, the pointer, MCI GENERIC PARMS, would stop and close the device.

For audio composition, we directly add the decoded audio streams from different de-coders but the effect is not good. The voice of the one decoder may become the noise for the another decoder. In addition, the overflow value also influence the tone quality. Hence we now only allow the audio stream from the first user. There may be a better method to solve this problem.

Figure 4.22: Related code for audio data opening.

Figure 4.23: Related code for audio playing.

Chapter 5 Experimental Results

In the last chapter, we discussed our system how to compose multiple videos and audios and display the result on the desktop. In this chapter, we present some experimental results on the speed and outcome in different situations. The test sequence is brea cif.cmp file which is a commonly used video example with binary shape information.

Figure 5.1 shows that the speed of the original decoder software with different number of users. The initial values in all case are almost the same because the program starts the decoder sequentially. If only one video decoder is present, the frame rate can get up to 53 fps (frames per second). If two decoders are at work, the frame rate would go down to29 fps. The frame rate is 19 fps for three decoders and 14 fps for four decoders.

The decreasing rate is not proportion to the number of the video decoders. This may be because the hardware and the operating system are more efficient when there are multiple decoder threads.

Since we need the 4:4:4 format, we must pad the data and calculate the RGB values.

As shown in Fig. 5.2, the frame rate decreases by about5–10 fps if we add the padding function. The efficiency is shown in Fig. 5.3 and it decreases by about20% compared with the original decoder.

Then, we must compose these videos and display the composite video. Figure 5.4 shows four composition cases and Fig. 5.5 shows the execution performance. According to our observation the frame rates are down to5 fps in the situation with only one decoder and2.5 fps in the situation with the two decoders. The decoder becomes too slow. This is

Figure 5.1: Performance of the original decoder.

Figure 5.2: Performance of decoder with padding.

Figure 5.3: Relative efficiency.

Figure 5.4: Four composition situations.

Figure 5.5: Performance of the decoder with composition and display.

because the SetPixelV function used to display the video on the desktop is a slow function.

We can see the decoding without display took down at the case with only composition and without SetpixelV function. The frame rate can be up to 18 fps and the SetPixelV function dominates the efficiency of the decoder. Figure 5.6 shows that the execution time for image display is about70% of the overall time.

In order to decrease the effect of the SetPixelV function, we reduce the number of the SetPixelV function calls. The result is shown in Fig. 5.7. As we can see, we increase the frame rate into12 fps and 6 fps in the two situations. The efficiency compared with the previous case without reducing the number of the SetPixelV function increases by about 140%. However, according to the Fig. 5.3, the efficiency compared with the original system still decreases by about72%.

Finally, we add the audio decoder into our system. Figure 5.8 shows the effect with the audio decoder with different numbers of video decoder. We can see that the frame rates decrease about1 fps compared with the system with only video decoders reducing SetPixelV function calls. This means that the effect of the audio decoder is small, almost negligible.

Figure 5.6: Time analysis of the decoder.

Figure 5.7: Performance of the decoder after reducing the SetPixelV function.

Figure 5.8: Performance after adding audio decoder.

Chapter 6

在文檔中以物件為基礎之MPEG-4視訊之多點視訊會議接收端 (頁 67-76)