研究方法 - 架構在階層式雙向預測影片上的多重描述編碼法之設計

4.1 Hierarchical B-picture Coding

In hierarchical B-picture prediction framework, the frames at lower hierarchical levels can be used as reference for the frames at higher hierarchical levels. Due to this dependency, the decoding quality of a frame strongly depends on the quality of the frames at its previous hierarchical level of the same GOP. The lower level at which a frame is lost, the more frames that will be corrupted. As an example in Fig.1(a), the loss of an I or P picture will directly affect 7 other frames, while the loss of a level-1, level-2, and level-3 B-pictures will directly affect 4, 2, and 0 other frames, respectively.

Based on this observation, the proposed MDC aims at providing unequal redundancy for the hierarchical B pictures, taking into account the unequal importance of the frames at different hierarchical levels.

The proposed MDC model is illustrated in Fig.2, where a non-dyadic hierarchical B-picture structure with 4 levels is used. We refer to the I/P frames at the lowest hierarchical level as key

frames; the B frames at intermediate levels as reference B frames (RB frames) because they are used

as reference; and the B frames at the highest level as non-reference B frames (NRB frames) because they are not used as reference. As the Fig.2(a) shows, we apply duplication (denoted by D) on key frames for providing the highest error resilience; spatial-splitting (S) on RB frames for modest error resilience; and temporal-splitting (T) on NRB frames for the lowest error resilience. The resulting two descriptions are illustrated in Fig.2(b), where the rectangles with a missing corner represent incomplete frames (due to spatial splitting). It can be seen that, due to different MDC methods applied, the frames at different hierarchical levels have unequal redundancy to provide robustness again errors. Assuming that description D0 is lost; the lost key-frames (0 and 12) can be easily reconstructed at decoder by using the same frames in description D1; the partially lost level-1 and level-2 frames (3, 6 and 9) can be estimated by using the information of their counterparts in description D1; while the lost level-3 frames (1, 4, 7 and 10) which are not in D1, can only be estimated by using other frames. The estimation methods will be discussed later in next section.

(a) Original sequence (b) The resulting two descriptors

Fig. 2 Proposed MDC based on hierarchical B-picture prediction.

4.2 Estimation of Loss Description

Taking advantages of different MDC methods applied on the frames at different hierarchical

levels, different estimation methods are designed for different frames. Table II summarizes the cases for different estimation methods to be applied, where S denotes the spatial method, T the temporal method, and D the duplication method. The columns describe the two loss cases; while the rows describe three types of frames.

Table I. Summary of the cases for different estimation methodsbv

4.2.1 One Descriptor Loss

In case of one-descriptor loss, since the lost key-frames can be reconstructed by simply using the duplicated version in the other descriptor, it is marked as D in Table I. As for RB frames, since they are split in the spatial domain, one-descriptor loss only causes partial-frame loss. In this case, spatial method (marked as S in Table I) is applied to estimate the lost part. After the received descriptor has been entropy decoded, de-quantized, and inversely transformed, the Spatial Merger will apply polyphase inverse permutation on the resulting data and then the residual pixels will be distributed like a checkerboard inside the macroblock as shown in Fig. 3, where each lost residual pixel has four available neighboring pixels. Our spatial estimation uses bilinear interpolation to reconstruct the lost residual pixels, as shown in Equation (1) where f’j,i is the reconstructed value of the residual pixel in column i and row j. Since neighboring pixels have high spatial correlation, spatial estimation should be efficient.

f’

j,i = (fj+1, i + fj-1, i + fj, i+1 + fj, i-1) / 4 (1)

Fig. 3 Spatial concealment by bilinear interpolation.

As for NRB frames, one-description loss will result in whole frame loss because they are split in the temporal domain. In this case, a temporal estimation method (marked as T in Table I) is applied to reconstruct the lost frame. Since the temporal method is also adopted for all types of frames in case of two-description loss, we describe it in the next subsection.

4.2.2 Two Descriptor Loss

In case of two-description loss, it will result in whole-frame loss regardless of frame types. For

whole-frame loss, each block in the lost frame is recovered based on temporal correlation since all the neighboring blocks are also lost. We refer to the pictures whose pixels are used to predict the missing pixels as the data prediction frame (DF) and the pictures whose block motions are used to predict the motion of the missing blocks as the motion prediction frame (MF). In our method, DF can be different from MF. Besides, the proposed methods adopt bi-directional motion-compensated signal to recover missing pixels. Thus, we need to select two DFs: a backward DF and a forward

DF (denoted by

and , respectively); and two MFs: a backward MF and a forward MF (denoted by and , respectively) for a lost picture. Since the data correlation among pictures involved tends to considerably weaken as the temporal distances among these pictures become longer, for a lost picture, it is better to choose the nearest pictures in display order to serve as its DFs. However, to serve as DFs requires that these pictures are decoded earlier than the lost picture.

Based on the hierarchical B-picture structure, for a lost picture, we select its reference frames in backward and forward directions as its and , respectively.

As for MFs, they are selected differently from DFs. In case of frame loss, even though the frames later than the lost frame (in decoding order) cannot be decoded before the lost frame is recovered, the motion information of these frames is obtainable. Therefore, the MFs need not to be located earlier than the lost picture in decoder order. Instead of using temporal direct mode (TDM) technique which adopts reference pictures as MFs, we choose pictures at higher levels because these pictures are temporally nearer to the lost picture in display order. As an example in Fig.4(a), if the frame 6 is lost, we will select its reference frames (0 and 12) as its DFs, but frames 3 and 9 as its MFs. In Fig. 4(a), if frame 3 is lost, we will select its reference frames 0 and 6 as DFs, but frames 2 and 4 as MFs. This selection policy is applied to all frames except NRB frames which are at the highest level within the hierarchical structure. For NRB frames, the MFs are selected from their reference frames at the previous level of the lost picture. Fig.4(b) illustrates the case of NRB frame loss, where frame 8 is the lost frame. In this case, frames 6 and 9 will serve as the DFs, and frame 9 (which is at previous level of frame 8) will serve as the MF. Similarly, if frame 10 is lost, its DFs will be frames 9 and 12, and its MF will be frame 9. Specifically, for the lost picture F_t^l at time

(a) DF and MF selection for RB frames (b) DF and MF selection for NRB frames

Fig.4 DF and MF selection for temporal estimation method.

After determining DFs and MFs, the motion vectors in MFs will be used to estimate the missing motion vectors (pointing to DFs from the lost frame). When the lost frame is a RB frame, since its MFs are located in between DFs and the lost frame (see Fig. 4(a)), the motion vectors are composed if the block in MFs has two motion vectors, or extrapolated if the block has only one motion vector.

The motion vector derivation corresponding to Fig.4(a) is illustrated in Fig.5(a), where the two motion vectors of bx in f3 are composed and the motion vector of by is extrapolated, so that the derived motion vectors will point to f0 from f6 . The motion vectors pointing to f12 from f6 can also be derived in a similar manner using motion vectors of f9. On the other hand, when the lost frame is a NRB frame, since one MF is used for two DFs located on different sides of the lost picture (see Fig.4(b)), the motion vectors in the MF are interpolated as illustrated in Fig.5(b) where the motion vector of block bw is interpolated to obtain two motion vectors respectively pointing to f6 and f9

from f₈. Let and denote the derived motion vectors pointing to and from the lost frame, respectively. For a lost frame, after all the motion vectors in its MFs have been composed, extrapolated, or interpolated, the missing pixels on the lost frame can be classified into four types: the pixels associated with one or more , the pixels with one or more , the pixels with both and , and the pixels without and . For a pixel P in the lost picture, we recover it by the predicted signal obtained as follows

(4)

Here, x is spatial coordinate of P. w0 and w1 are the weighting values, which are set in inverse proportion to the temporal distances of and , respectively, from the lost picture.

(a) Motion composition and extrapolation for a RB frame (b) Motion interpolation for a NRB frame

Fig. 5 Temporal estimation using bi-directional predicted signal.

在文檔中架構在階層式雙向預測影片上的多重描述編碼法之設計 (頁 7-11)