Wavelet Video Bitstream Analysis For streaming applications, the quality

of video is affected by packet loss distortion.

In addition, one of the most difficult problems for RDO streaming is about how to measure packet loss distortion. In practice, distortion due to packet loss depends heavily on the source coding method. In this section, the wavelet video coding schemes presented in [4][4] are investigated in detail. In particular, some experiments are conducted to exhibit the effect of packet losses of different wavelet subband data on the reconstructed video quality.

The block-diagram of a wavelet-based video coding system is shown in Fig. 1. In a T+2D wavelet coder, an input video sequence will be temporally decomposed first using motion compensated temporal filtering (MCTF) [1]. The output of MCTF is then

further decomposed by a 2-D spatial wavelet transform on a frame-by-frame basis. For example, a two-level temporal decomposition has three temporal subbands, namely, P(Ht, YUV), P(LHt, YUV), and P(LLt, YUV). When a group of pictures (GOP) size is 8, a typical structure of the T+2D wavelet coder has 4 P(H_t, YUV) frames, two P(LH_t, YUV) frames, and two P(LLt, YUV) frames. In each frame, it consists of one luminance component (Y) and two chrominance components (U and V).

After temporal and spatial subband transforms, the coefficients of different subbands are logically segmented into coding blocks, based on the structure of Fig. 19, and each coding block is independently coded by an entropy coder. For instance, a coding block size in Fig. 19 has block depth 2 (i.e.

two frames), block height 36 (= 288/2³), and block width 44 (= 352/2³). Common entropy coding techniques for wavelet video are 3D Embedded Subband Coding with Optimized Truncation (3D-ESCOT) [4] and 3D Set Partitioning in Hierarchical Trees (3D-SPIHT) [28]. The 3D-ESCOT algorithm has higher compression efficiency and better scalability than the 3D-SPIHT algorithm. Therefore, the proposed scheme is based on 3D-ESCOT coding technique.

P(LL_t, YUV) P(LH_t,YUV)

P(H_t,YUV) input video

sequence

1^sttemporal level

2^sttemporal level

P(LL_t, YUV) P(LH_t,YUV)

P(H_t,YUV) input video

sequence

1^sttemporal level

2^sttemporal level

Fig. 18. Wavelet Video Coding Block Diagram.

P(H_t,YUV)

Block Depth

Block Height

Block Width

P(H_t,YUV)

Block Depth

Block Height

Block Width

Fig. 19. Examples of Coding Block in Wavelet Video Coding.

During the 3D-ESCOT entropy coding process, the entropy coder (fractional bit-plane coding and context-based arithmetic coding) operates one coding block at a time, and each coding block consists of N total bitplanes, where N is the number of bits in the most significant coefficients. Three encoding operations of the context-based arithmetic coding (zero coding, sign coding, and magnitude refinement) are used to characterize the significance of coefficients in a bit-plane. Following the 3D context modeling, fractional bit-plane coding ensures that the bitstream is arranged with fine granularity of SNR scalability for each coding block. The fractional bit-plane coding procedure consists of three distinct passes which is significant propagation pass, magnitude Refinement pass, and normalization pass. Since the first bitplane of coding block can only process with a normalization pass, a coding block contains 3N–2 coding passes. After the entropy coding, candidate truncation point of a coding block is associate with rate distortion slopes (R-D slope). For truncating the bitstream to an optimal truncation point, those points not on the convex hull are eliminated, and the R-D slopes are λ0, λ1,…,λ(3N-2), where|λ0 |> | λ1 |>…>|λ(3N-2) |. All coding blocks have a similar R-D curve as the example shown in Fig. 20, and the top coding passes contain the most important video data. Therefore, the higher level of protection is required as in the top bitplane coding passes.

0 2 4 6 8 10 12 14

x 10⁴ 2

4 6 8 10 12 14 16x 10⁶

Rate

Distortion

P(Ht,Y)- block 0

Fig. 20. The R-D curve of coding block 0 of subband P(Ht, Y) of STEFAN.

In order to obtain better perception into the significance of different bitstream segments across different temporal subbands, some experiments are conducted. For example, using a four-level MCTF temporal decomposition, a group of frames are temporally decomposed into the LLLL, LLLH, LLH, LH, and H subbands. Besides, each temporal subband may comprise many spatio-temporal subbands generated by spatial decomposition. As shown in Fig. 21, for an encoded video with four-level temporal transform and three-level spatial decomposition, each temporal subband (TSB) is split into nineteen spatial subbands (SSB) from 0 to 18. The distortion impact of the first coding block within a higher spatio-temporal subband (e.g. (b), (c), (d)) is indeed more sensitive than that of the last coding block within a lower spatio-temporal subband (e.g. (e)).

In practice, given an estimated packet lost rate, we want to apply different amount of error protection for different portions of coding block based on their importance.

Therefore, further analyses of wavelet subband ‘rate’ versus ‘channel-distortion’

analysis are conducted as follows. Since the size range of coding blocks is various (see Fig. 22), it is not suitable to be a data interleaving unit. A coding block should be split into several smaller units for performing interleaving. Within each coding block, the first coding pass is usually a small size (see Fig. 23) and has the highest importance value (see Fig. 24 and Fig. 25). For evaluation the

performance degradation, 10% injured bits are placed in the different portion of coding blocks. When the injured bits locate at the beginning of coding blocks, it may cause a big perceived degradation of video quality.

Hence, the error protection for different portions of coding block should be setting up a different strategy.

(a) P(LLLLt,Y) (b) P(LLLHt,Y) (c) P(LLHt,Y)

(d) P(LH_t,Y) (e) P(LLLL_t,Y) (f) P(H_t,Y)

Fig. 21. Reconstructed video when a chunk of TSB data is lost. The loss occurs in coding block 0 of SSB 0 for the TSB in (a)-(d), and coding block 0

of SSB 18 for the TSB in (e)-(f).

0 1 2 3 4 5 6 7

0 200 400 600 800 1000 1200

Index of Blocks

SourceRate(bytes)

MSRA wavelet

Fig. 22. Source data rate in SSB 0 of subband P(Ht, Y) of STEFAN.

1 3 5 6 7 8 9 10 11 12

0 50 100 150 200 250 300 350 400 450 500

Index of Coding Passes

SourceRate(bytes)

P(Ht,Y) SSB 0

Fig. 23. Source rate of coding passes on the convex hull in block 0 of STEFAN.

0 200 400 600 800 1000 1200

39.6 39.8 40 40.2 40.4 40.6 40.8 41 41.2

Rate (bytes)

AveragePSNR(dB)

Block 0 Block 1 Block 2

Fig. 24. PSNR with coding block loss in SSB 0 of the TSB P(H_t, Y) of

STEFAN.

0 10 20 30 40 50 60

15 20 25 30 35 40 45 50

Frames

PSNR(dB)

The top coding pass loss The near-top coding pass loss The last coding pass loss

Fig. 25. PSNR with coding pass loss in block 0 of SSB 0 of the TSB P(Ht, Y)

of STEFAN.

Packet loss is a major cause of non-deterministic distortion for video streaming applications. For example, over fiber networks, bit errors rarely occur. The bit error rate of fiber networks is only 10^-9only [29]. The main reasons for packet losses are mostly because of network congestion, which causes packet losses in the ATM switch queue buffer [30]. As Fang et al. [29] and Biersack [30] pointed out, FEC protection scheme is effective to recover packet loss with minimum overhead for multimedia streaming. Hence, the proposed framework applies previous analysis on wavelet video to the design of a content-dependent interleaved FEC coding scheme for scalable streaming systems.

The basic concept of our context-adaptive FEC streaming scheme is to

add different FEC protection level (subject to predicted packet loss rate) to different waveletsubband databased on thedataset’s R-D slope (or, equivalently, the distortion-reduction rate). Fig. 26 illustrates this concept using some examples from the proposed algorithm. The content-adaptive FEC protection is applied to the coding block 0 of temporal subband P(Ht,Y) and spatial subband 0 of the STEFAN sequence. In this plot, the y-axis is the distortion reduction rate (i.e. the slopes of the conventional R-D curve as in Fig. 20) and the x-axis is the bitrate (including source data bits and FEC protection bits). The dash line is the original subband data without any protection, while the solid line with circle markers is the FEC protected data given 3% estimated packet loss rate and the solid lined with “plus” markers is the protected data given 8%

estimated packet loss rate. The lower the rate point, the higher the protection level. The exact equation to compute the protection level will be described in a moment. Note that the function in Fig. 26 can be used for operational RDO streaming decision since it is exhibits rate versus source-and-channel distortion tradeoff.

P(Ht,Y)- block 0 P(Ht,Y)- block 0 (3% loss) P(Ht,Y)- block 0 (8% loss)

Fig. 26. Content-adaptive FEC protection examples.

In the proposed framework, for each group of video bitstream, an (n, k) Reed-Solomon (RS) code-based FEC is applied to add resiliency to the data. In Fig.

27, n is the codeword length of the RS encoder, k is the number of video data symbols (8 bits of bitstream data in this case), and s is the number of correctable symbols.

The number of parity symbols is 2s, where 2s

= n–k. If burst errors occur during transmission, then the RS decoder can correct up to s errors and detect up to 2s errors per codeword. symbols of video data and 2s

symbols of parity.

For 3D-ESCOT, each coding block j has temporal level index ωj, component index νj, spatial subband index τ_j, and block index ψ_j. Assume a coding block bitstream is generally divided into l codeword. Then, the importance of the various portions in one coding block can be expressed as in Eq (1).

 T is the maximum temporal level index, Y is the maximum component index, B is the maximum spatial subband index, G is the maximum block index, and U1, U2, U3 are weighting factors. The optimization of the content-adaptive FEC protection problem can ter of the coding block cj,i(x) given by (1) subject to the network conditions. In addition, the bitstream of a coding block contains different number of coding passes. Larger value of s (number of correctable symbols) is required for earlier coding passes since the importance of data is arranged in coding pass order. Therefore, the allocation of expected protection to different coding passes stream is proposed to be Eq (2) and (3):

where λ0is the R-D slope of the first

coding pass in block j and npl denotes the estimated packet losses given current bandwidth RBW, average packet size Ps and packet loss rate pl. β is a scale factor determined from empirical analysis. Eq. (2) is designed so that sj,0 ≥ sj,1 ≥…≥ sj,l-1, that is, the level of protection decreases following coding passes order. Note that npl = pl

RBW / Ps , wherestandsfor“thelargest integerlessthen”.

3. The proposed Packetization Scheme

在文檔中基於MPEG標準之多媒體通訊與串流整合平台及其應用(III)-子計畫三:MPEG多媒體傳輸機制及通訊協定在嵌入式行動平台上的分析設計(III) (頁 24-28)