Generalized MIG Derivation - Generalized MIG Derivation and Improved Mode Decision Method

Chapter 6 Generalized MIG Derivation and Improved Mode Decision Method

6.2 Generalized MIG Derivation

We now try to find relationship connection between the residual signal statistics and the motion bitrate. As discussed earlier, ρv and αv denote, respectively, the zero-value probability and the shape parameter in one-sided ρ-GGD model of the residual signal using motion vector v. Thus, ρ0 and α0 are the residual signal statistics when v=0. We substitute (51) into (14) with the corresponding parameters, and (14) becomes

. (52) (52) can be simplified to (53),

. (53) Interestingly, the target coding rate term, RT, in (14) is eliminated. This elimination implies that (53) is a rate-independent criterion for checking the motion prediction efficiency.

Therefore, in theory, this criterion is applicable in the multiple operation rate situations, such as scalable interframe wavelet coding. However, this criterion needs to be adjusted to match the real video data.

We can examine (53) from a different perspective. Let the residual signal produced by

using motion vector v be x∈^X^v. Similar to the derivation of (49), the differential entropies of X0 and Xv are expressed, respectively, as

. (54)

- 59 -

If motion vector v results in good motion compensation, the differential entropy of the residual signal should be smaller than that obtained by using the zero motion vector. The positive difference of the differential entropies of X0 and Xv is as follows.

. (55) We can find that (55) is exactly the numerator of the left term in (53). Thus, (53) is reduced to

. (56) In (18), a similar conclusion was obtained based on the Laplacian source assumption.

However, as discussed in Section 4.2, this result does not match the real-world situation due to at least two factors: one is that a practical coder cannot achieve the rate-distortion bound predicted by the information theory; and the other factor is that the real video data do not completely satisfy the mathematical assumptions in theory such as stationarity and probability distribution. Thus, the theoretically derived rate-distortion function may not accurately represent the relationship between the produced coding rate and the real distortion. Therefore, we modified (56) to

, (57) where C is the MIG lower bound in real world. Due to this divergence problem, C is not 1 for a practical wavelet coder applied to the test video data. Therefore, two parameters are introduced and inserted into (14) to reflect the model divergence problem. We rewrite (14)

- 60 -

, (58) where is the “real distortion” measured from the quantized residual signal compensated using motion vector v. is the “ideal distortion” derived from the rate-distortion function of the source model in (14). And a new parameter βv is introduced to compensate for the difference between and . In other words,

. Or,

. (59)

Here, we assume that a (nearly) constant multiplication factor is adequate for compensating the model divergence. Since this factor is introduced to bridge the gap between the ideal case and the real world case, it is to be verified by the test data. Then, , and β0 are similarly defined for using the 0 motion vector. Hence, (58) can be rewritten as

, (60) By replacing by the rate-distortion function in (51), (60) gives

. (61) (61) is very similar to (56). In the ideal case, the “ideal distortion” would be equal to the

“real distortion”, which makes =1 and =1 and (A.4) would fall back to (56). Therefore, for the real case, the MIG lower bound C becomes

. (62)

- 61 -

Let denotes the quantized residual signal. According to (51), is calculated by , (63) where is the entropy of the quantized residual signal. Use (59) and (63), (62) can be rewritten as

. (64)

Based on (64), the C value can be found using statistical analysis. How to obtain the quantized residual signal and is an issue. The scalable encoder does not have the bitstream extraction condition at the MCTF stage. Due to this reason, it becomes very tricky to select a quantization step size to generate and . However, the purpose of generating the quantized residual signal is to simulate the divergence problem of the rate-distortion function. We conjecture that there exists a certain range of the quantization step sizes that are representative. Therefore, we take an engineering solution to find a proper quantization step size for deriving the C value. We ran exhaustive experiments for all sequences and found that 8 is generally a good quantization step size for estimating C in (64).

- 62 -

Therefore, we design an adaptive C-value updating scheme. In our scheme, there are two levels in the C value adaptation: frame level and GOP level. In the frame level, we collect the statistics of the macroblocks with non-zero motion vector and calculate the frame-level C value using (64). This new C value is then used for the next frame. If the encoding frame

is the last frame of the GOP, the GOP-level C value is updated by averaging all frame-level C values in that GOP. Then, we explain the connection between the frame-level and the

GOP-level adaptations. The newly derived frame-level C value is limited to the range of

[ ], where is the current GOP-level C value and is

used to prevent from the extreme values due to noise or insufficient data in the adaptation process. Also, the GOP-level C value is also limited in the same rage in the adaptation process. For example, if the newly derived GOP-level C value is larger than the previous plus , the new GOP-level C value is set to . In our experiments, is chosen to be 0.5 empirically.

Table 6-1. The average frame-level C values using the proposed adaptive scheme

Test sequence Average C value

Tempete 7.75 Mobile 7.43 Foreman 7.37 Container 7.99

Waterfall 7.12 Irene 6.43

- 63 -

Table 6-1 shows the average frame-level C values using this adaptive approach. We can see that the average C value is around 7, which is consistent with our previous finding -- in the range of [4, 10] (in Section 4.3). The proposed adaptive scheme verifies that our previously used offline-trained C value is adequate. Now we compare the rate-distortion performance of the adaptive C scheme and fixed C scheme. We pick up four CIF test sequences: Mobile, Container, Waterfall, and Irene. The test bitrate points are 256kbps, 384kbps, 512kbps, 800kbps, 1024kbps, 1200kbps, and 1500kbps. The average PSNR results of 7 test points of these two schemes are shown in Table 6-2. As Table 6-2 shows, their PSNR performances are very similar. However, from (64), we can see that the adaptive scheme requires a lot of additional encoding operations. In the experiment section of this chapter, the results are obtained using the offline-trained C value, which is 7, and it still outperforms the conventional Lagrangian method.

Table 6-2. The average PSNR results of two different C value scheme Test sequence Offline-trained C value Adaptive C value

Mobile 33.625 33.631 Container 45.347 45.351

Waterfall 41.038 41.046 Irene 41.441 41.461

- 64 -

在文檔中適用於可調式小波視訊編碼之訊源機率模型與位元率-失真最佳化方法 (頁 68-74)