The proposed Packetization Scheme and Streaming Framework

In the following discussions, we use the terminology “block-bitstream segment” to describe a portion of bitstream bytes of a coding block across spatio-temporal subbands (see Fig. 19). A block-bitstream segment is composed of one or more coding passes. The packaging of the scalable bitstreams into UDP packets is accomplished following both rate control and error control constraints. These constraints try to fulfill the following goals:

1. Error protection level of a block-bitstream segment should depend on its entropy. The higher the entropy, the higher the protection level should be. Note that since a block-bitstream segment is only a small chunk of data in a coding block, the granularity of content adaptation of the FEC protection is at a very fine scale.

2. The streaming packet rate of the system should stay as low as possible.

UDP packet size should be smaller than the MTU (Maximum Transmission Unit) allowed by the network links (typical size is around 1500 bytes for wired networks, and less than 1000 bytes for mobile networks). On the other hand, processing a lot of small packets causes very high overhead to the streaming system, especially on the client side. Therefore, a reasonable packet size is slightly smaller than the MTU.

3. Although interleaving with FEC

works well for handling packet losses, it does introduce extra delay to the transmission of video data. Therefore, the selection of interleaving group size must take into account the end-to-end delay of the whole systems.

In general, for video streaming, overall delay should be less than 10 seconds.

Packetization of FEC-protected data As mentioned in the previous section, a systematic Reed-Solomon (RS) codeword comprising of data symbols and parity symbols is used for content-adaptive FEC protection. RS coding used for the protection of the block-bitstream segment is depicted in Fig. 28. Assume that the total number of coding block is L, i =0,…, L-1, for each coding block i, bitstream can be divided into m-data symbol unit, it begins with the first block-bitstream segment Ci,0 and continues through C_i,1, C_i,2, ... to C_i,m. An (n, k_x), x = 0,…,m, RS code is then applied to add resiliency to the m-data symbol unit. Since the block-bitstream segments have large variations in size, one must pack variable number of block-bitstream segments into a data unit to reduce packet overhead. In addition, different levels of protection are allocated to different portions of the coding block, k_m ≥ km-1 ≥…≥ k0. Furthermore, the data symbols gathered at the front end of the data unit, and the parity symbols are located at the back end of the data unit. For each data unit, there is a header that describes the protection level of the data unit. The header is also protected by RS coding. Also note that if data unit is not a multiple of k, zero-padding will be applied at the end of the data. These padding bytes do not have to be transmitted though.

Reed-Solomon Symbols

C0,0 C0,1 C0, m

P(LLLL_t, Y) S0Block 0 P(LLLHt, Y) S0Block 0 Unit 0

Unit 1

Unit L-1 P(LHt, Y)

SeBlock j RS0,0 RS0,m

Header

Reed-Solomon Symbols

C0,0 C0,1 C0, m

P(LLLL_t, Y) S0Block 0 P(LLLHt, Y) S0Block 0 Unit 0

Unit 1

Unit L-1 P(LHt, Y)

SeBlock j RS0,0 RS0,m

Header

Fig. 28. Packetization for one group of video data.

Since we are dealing with a packet-loss

channel, not a bit-error channel, a byte-wise data-interleaving scheme is used to shuffle the RS coded data among several data packets before transmission. As illustrated in Fig. 29, a block-bitstream segment is spread across many packets (each packet is composed of the group of data in dashed lines in Fig. 29). For each packet, in addition to video data payload, we also have to transmit the highest protection level, temporal subband index, component index, spatial subband index, and block index in order to properly de-interleave the data.

When interleaving is used, the interleaving depth must match the worst-case of channel conditions against burst errors. In addition, a large interleaving depth will have impact on the packet buffer size of the client and the end-to-end delay of packet transmissions.

The interleaving depth should be appropriately chosen to handle the worst case error bursts of the networks. As mentioned in section 2, the number of parity symbols is 2s, where s means the number of correctable errors by an RS decoder. A data unit can be split into several r equal-length sub-units and each interleaved packet is composed of q data symbols from each sub-units. Hence, q is limited by the number of parity symbols s, and p is limited by the maximum end-to-end delay.

Packet 1 Packet h Sub-Unit_1,1

Fig. 29. Data-interleaving scheme for one group of video data.

Streaming policy

The proposed framework will adapt to the fast varying channel conditions by using the real-time network statistics feedbacks from the client side. Through standard RTCP receiver reports, the server can obtain the

statistics such as round-trip time (RTT), jitter, short-term packet losses, and accumulative packet losses. The packet loss rate is used to compute the content-adaptive FEC-protected data rate-distortion tradeoff information as described in section 2. In addition, the server can compute the effective channel bandwidth through the last packet sequence number received by the client and loss rate. Based on the estimated channel bandwidth and the rate-distortion information, the system performs a dynamic rate allocation at discrete transmission time to enhance the perceived quality whenever the network bandwidth is good enough for perceptible quality improvement.

Fig. 30. Redundancy packets for protect source packets one group of video

data.

For the correction of errors, parity packets are employed to recover from lost data packets. But some of parity packets may be lost or corrupted when transmitting packets over the networks based on the UDP protocol. For enhancing the system performance, error recovery mechanisms such as retransmission or error-correction can be applied to handle uncorrectable errors.

Instead of using retransmission scheme to all parity packets, the proposed system delivers more redundancy parity packets to those packets carrying important portion of blocks and fewer to other packets. As seen in Fig. 30, all of the blocks are arranged according to the degree of importance of each spatial-temporal subband. In addition, the higher protection-level parity symbols are gathered together into one packet for maximum the efficiency of the error recovery scheme.

4. Experiments

This section presents the experimental results of the proposed video streaming system. The block diagram of the proposed streaming system is shown in Fig. 31. The system is based on the MPEG-21 Test Bed for Resource Delivery [31] (the source code of the original test bed can be downloaded from

http://clabprj.ee.nctu.edu.tw/~mpeg21tb/).

The test bed includes an IP transmission link emulator (based on the NIST Net [32]) that allows real-time emulation of various network conditions. We have added Reed-Solomon coding modules, a data-interleaving module, and a data de-interleaving module to the original test bed.

Digital Item Adaptation

Streamer

Server Controller Media

Database

Stream Buffer Media Decoder

Client Controller

Server Client

QoS Decision

Packet

Buffer Packet

Buffer

Deinterleaver RS encoding

Encoded Media Files

RS decoding Interleaver

QoS Decision RTSP

RTP

RTCP

Fig. 31. Architecture of the proposed system.

The CIF version of the standard MPEG test sequences STEFAN and MOBILE are used for the experiments. Those sequences are encoded using MSRA 3-D wavelet video coding software [33] at 15 frames per second and a GOP is composed of 64 frames. Four levels of 5/3 MCTF temporal decomposition and three levels of 9/7 wavelet spatial decomposition are used for subband coding. The number of luminance (Y) blocks is around 1024 block-bitstream segments, and the number of chrominance (U and V) blocks is around 608 block-bitstream segments.

To evaluate the performance of the proposed system, reasonable range of packet loss rates should be used. Over wired links, studies showed that based on MPEG compressed video using the RTP and UDP

transport protocols reported the average packet loss rates, ranging from 3.0 to 13.5 percent [34]. Over wireless links, Lai et al.

[20] reported the characteristics of the MosquitoNet wireless network. The packet loss rates were 25.6% when packets were sent from a mobile host to a router, and 3.6%

when packets are sent from a router to a mobile host. Rosa et al. [21] did a comprehensive study of the handover mechanisms during the disruption time in the wireless network. They reported that the packet loss caused by the handover mechanism was below 0.3%. Based on these published studies, we have set the packet loss rates of our experiments to 5%.

The proposed content-adaptive FEC protection framework is compared against a fixed FEC protection streaming system. The PSNR of the luma channel of the reconstructed video sequences are shown in 0 and Fig. 33. The level of protection for the content-adaptive FEC system is determined by Eq (2), while the level of protection of the fixed FEC is determined by the (predicted) average number of packet loss for each second. In either case, the maximal packet loss protection level can only recover up to 4% packet losses on average. As one can see from the figures, the adaptive FEC protection scheme works much better than the fixed level protection scheme.

1000 1500 2000 2500 3000 3500 4000

32 33 34 35 36 37 38 39 40

Rate (kbps)

AveragePSNR(dB)

Equal Protection Content-Adaptive FEC

Fig. 32. Comparison between fixed and content-adaptive FEC protection for the STEFAN sequence. (frame

rate:15, GOP size: 64, packet loss rate:5%)

2500 3000 3500 4000 4500 5000 32

33 34 35 36 37 38 39 40

Rate (kbps)

AveragePSNR(dB)

Equal Protection Content-Adaptive FEC

Fig. 33. Comparison between fixed and content-adaptive FEC protection for the MOBILE sequence. (frame

rate:15, GOP size: 64, packet loss rate:5%)

在文檔中基於MPEG標準之多媒體通訊與串流整合平台及其應用(III)-子計畫三:MPEG多媒體傳輸機制及通訊協定在嵌入式行動平台上的分析設計(III) (頁 28-31)