CHAPTER 4 HYBRID MODEL
4.3 E STIMATION OF L OST DESCRIPTION
4.3.3 Estimation Method Selection
There are 16 states of the four descriptions as listed in Table 4.1, where the columns describe the four possible cases for the two descriptions split from prediction loop T0;
while the rows describe those for T1. The estimation method to be applied for each case are also shown in this table, where ‘T’ denotes the temporal estimation, and ‘S’
the spatial estimation. The ‘S→T’ denotes that spatial method will be performed first and then temporal method is applied; and the ‘A’ indicates that either temporal or spatial method will be applied but the choice of the method adaptively depends on the content of the video. The ‘N/A’ means no estimation method will be applied. As can be seen in the table, ‘S→T’ is applied only for the cases of three-description loss;
26
while ‘T’ is applied only when two descriptions split from the same prediction loop are lost and the other two are received, that is, the whole frame is lost. For whole frame, temporal method of B-PMVI is used. For other cases, ‘A’ will be applied.
Table 4.1 Summary of estimation methods in the corresponding cases.
Since the Hybrid method splits every two frames into four descriptions using two prediction loops, for consecutive two frames, n and n+1, we refer to their prediction loops as T0 and T1, respectively; and refer to the two descriptions split from frame n as ( ) and ( ); while the other two from frame n+1 as ( ) and ( ). To illustrate the cases that ‘S→T’ will be applied, Figure 4.12 (a) depicts one of the four possible cases that three descriptions are lost. The descriptions marked with ‘(x)’ mean they are lost. In this case, since ( ) from prediction loop T0 is received, spatial estimation can be applied to reconstruct its counterpart, ( ), as indicated by the dotted arrow labeled with ‘S’. After merging ( ) and ( ), the reconstructed frame , together with the backward frame , are used by temporal method B-PMVI to recover the lost whole frame , as indicated by the dotted arrow with ‘T’. Figure 4.12 (b) shows how the ‘S→T’ is performed.
27
(a) Three description loss (b) Spatial and then temporal estimation Figure 4.12 ‘S→T’ for three missing descriptions.
To illustrate the cases that ‘T’ will be applied, Figure 4.13 (a) and Figure 4.13 (b) depict two possible cases that two descriptions from the same prediction loop are lost.
In either case, spatial estimation method cannot be applied because the lost description has no counterpart from the same prediction-loop available for spatial estimation. For these cases, temporal method of B-PMVI will be applied for whole frame estimation. For example in Figure 4.13 (a), after merging and polyphase inverse permuting, the full frame from prediction loop T0 can be obtained, which is then adopted by temporal estimation to recover the lost frame belonging to prediction loop T1.
(a) Two descriptions of T1 are lost (b) Two descriptions of T0 are lost.
Figure 4.13 Temporal estimation for two missing descriptions from the same prediction loop.
As for adaptive estimation method, it will be applied when two descriptions from
28
different prediction loops are lost or when there is only one-description loss, as those labeled by ‘A’ in Table 4.1. Figure 4.14 (a) depicts one out of four possible cases that one description is lost, and Figure 4.14 (b) shows one of four possible cases that two descriptions from different prediction-loops are lost. Note that, in these cases, since each lost descriptor can obtain correct motion vectors from the counterpart of the same prediction loop, motion compensation is able to be performed without estimation. Thus, only lost residual data needs to be estimated. For these cases, adaptive method which could be either spatial estimation or temporal estimation of B-PMVE will be applied. As an example in Figure 4.14 (a), the missing residual of ( ) can be predicted either from ( ) by using spatial estimation, or from two frames, and , by using temporal estimation method, B-PMVE. In Fig.
11(b), the lost residual of ( ) and ( )can also be predicted either by spatial or temporal methods in a similar way aforementioned.
(a) One description is lost
(b) Two descriptions from different prediction loops are lost Figure 4.14 Adaptive selection of estimation methods.
29
Intuitively, it is more beneficial to adopt spatial estimation if it is a simple textured and high-motion video; and to apply temporal estimation if it is a slow-motion and complex textured video. To effectively select appropriate estimation methods for the above cases, a content- adaptive method is designed for the decoder, which measures the gradient of the lost residual pixels along the spatial and temporal dimensions to determine the characteristic of the video content and then makes the choice.
The spatial gradient (GS) of a residual pixel is calculated as the average of the difference between its two adjacent residual pixels in horizontal direction and that in vertical direction. Let rn(i, j) denote a residual pixel at (i, j) of frame n. The spatial gradient of this residual pixel is defined as:
( , )= ( , )− ( , ) + ( , )− ( , ) (4.4)
The temporal gradient (GT) of a pixel is measured as the difference between the motion-compensated pixel in reference frame and the pixel at extrapolated location in the next frame, where pixel values, instead of residual-pixel values, are used in the calculation. For a residual pixel at (x, y) of frame n, assume its forward and backward motion vectors are (fx , fy) and (bx,by), respectively, obtained by using pixel-based MV extrapolation.The temporal gradient of this residual pixel is then defined as:
( , ) = , − , (4.5)
where ( , ) denotes the pixel value at (i, j) of frame k. In order to explore the relation between estimation methods and the gradient values, experiments were conducted for 1156 frames from 4 different QCIF sequences. All frames are encoded using the proposed Hybrid MDC and simulated with one-description loss. The lost description is reconstructed using temporal estimation on a per-frame basis without error propagation. The PSNR (denoted by T_PSNR) results of all frames are sorted in
30
an ascending order and depicted in Figure 4.15, where the average GT of each frame is also shown. Similar experiments were also conducted for spatial estimation method, and the average GS and PSNR (denoted by S_PSNR) are also presented in Figure 4.15. As expected, the T_PSNR increases as GT decreases and the S_PSNR increases as GS decreases. The difference between S_PSNR and T_PSNR of the same frame can be up to more than 10dB or down to equivalence (0.5dB in average), confirming that, to obtain the best PSNR for each frame, the choice of estimation methods must be content adaptive. Besides, it is also observed that there is a single intersection for the two PSNR curves, where on each side of the intersection, one curve is always above the other one. Similar phenomenon also happens on the two gradient curves.
By lifting up the GS curve about some units, the two intersection points will happen on the same frame. Then, almost all the frames with GS lower than GT will have higher S_PSNR than T_PSNR, indicating that spatial estimation is preferred for these frames. On the other hand, for those frames with GT lower than GS, temporal estimation is preferred. Let e(A) denote the estimation method selected by adaptive method. Then, for a lost residual pixel at (x, y) in frame n, its e(A) is determined as
( , ) = , ( , )+ < ( , )
, ( , )+ > ( , ) (4.6)
Figure 4.15 Relation between PSNR and gradient value.
31
Where σ is 3.22 for Figure 4.16 in which QP=28 is used. By conducting more experiments with more QPs, we found thatσis a function of QP. As depicted in Figure 4.17 where 11 different QPs ranging from 18 to 38 are used, the relation between σ and QP can be modeled using a quadratic equation as follows.
= . − . + . (4.7)
Figure 4.16 Relation between PSNR and adjusted gradient value.
Figure 4.17 Relation between σ and QP
32