Concept of Operation - 結合樣板及區塊動作補償之雙動作向量預測方法

Figure 3.1 depicts the basic concept of the proposed scheme. Like the conventional bi-prediction, it predicts a target PU based on two predictors. These predictors how-ever are weighted in a pixel-adaptive manner using POBMC [10], with one of them derived from a MV vt found by TMP [1][2][3] and the other from the usual motion compensation. Since vt can be inferred on the decoder side, this scheme has to signal motion parameters for only one block MV (denoted as vb). Additionally, we restricted v_b to be uni-directional prediction here in order to reduce the motion cost needed for bi-prediction.

Chapter 3. Combining Template and Block Motion Compensations

Figure 3.1: Joint application of TMP and BMC.

the target block B by POBMC framework. It should be noted that we restrict different PUs of the same size share the identical window functions. As a result, the definition of vb is decribed as (1.1).

One of the solution of minimizing (1.1) is the least-squares method, which is an under-determined problem since a distinct solution has to be sought for each possible context. Although the least-squares method is feasible and optimal in theory, it’s still impractical since the training process is too much time consuming.

Instead of least-squares method, we resort to the parametric framework in Section 2.5. To proceed, we start with an exploration of its average behavior. According to the motion sampling positions in a target PU B, vt is approximate to the true motion v(s_t) of pixel st at the template centroid [12]. However, we avoid making the same approximation for vb because the search criterion is no longer to minimize the sum of squared prediction errors¹ (cf. (1.1)). vb is approximatd as the true motion of some unknown pixel b in B. Now the problem of determining wb(s) as the search for an optimal sampling position sb, sb ∈ B that minimizes the sum of mean squared prediction errors (SMSE) over B:

1A block MV approximates the pixel true motion at the block center only if its search criterion is to minimize the sum of squared prediction errors [11][9].

Chapter 3. Combining Template and Block Motion Compensations

To compute the expectation in (3.1), we replace Ik(s)−Ik−1(s + v(si)) with d(s; v(si), which denotes the residual signal when Ik(s) is predicted from the motion-compensated signal Ik−1(s + v(si)), and then rewrite (3.1) as

We assume the prediction errors in the target PU B are uncorrelated with each other, i.e., E [d(s; v(sb)d(s; v(st)] = 0, then (3.2) is approximate to

Due to the non-linear nature of (3.4), sb must be found by numerical method, that is, to compute SMSE for every admissible location of b. Once it is solved, the w_b^∗(s) and w_t^∗(s) are thus obtained immediately by (2.8). Then, (1.1) is reformulated as

v^∗

To verify where sb should be located, we take an example as illustrated in Fig. 3.2 (a). In such case, Fig. 3.2 (b) plots the SMSE surface as a function of b according to (3.4). As can be seen, SMSE value decreases when b approaches to the the bottom right quarter. A more precise calculation shows that the optimal location of b (thus

Chapter 3. Combining Template and Block Motion Compensations

(a) (b)

Figure 3.2: (a) geometry relationship between TMP centroid, and s^b. (b) SMSE surface as a function of the location of pixel b.

bottom-right quarter to minimize the prediction errors in the remaining part of B.

3.3 Window Functions

In this thesis, five different template designs are evaluated in 2N × 2N PUs as de-scribed in the first column of Fig. 3.3, while the second and third columns plot the corresponding window functions of w^∗_t(s) and w_b^∗(s), for vt and vb. The waveforms of template shapes, e.g. AL, suggest a special type of geometry motion partitionings [13]

with two MVs located on the diagonal running from above-left to bottom-right corners within a PU. Also, AR and BL follow the same rationale. Following the same line of derivation, we can obtain the window functions for those rectangular template designs.

In particular, asymmetric-like motion partitionings [13][7] result when the template region locates directly above or to the left of a target PU (cf. Fig. 3.3). Two concep-tual differences however are to be noted. First, unlike explicit geometry or asymmetric partitions, these implicit “soft” partitions incur less motion cost (only one MV is to be signaled). Second, there is a strong interdependency between the transmitted and inferred MVs because of OBMC (cf. (1.1)).

Chapter 3. Combining Template and Block Motion Compensations

Above-Left (AL) w_t^∗(s) w^∗_b(s)

Left (L) w_t^∗(s) w^∗_b(s)

Above (A) w_t^∗(s) w^∗_b(s)

Above-Right (AR) w_t^∗(s) w^∗_b(s)

CHAPTER 4 Experiments

4.1 Experimental Conditions

4.1.1 Common Test Conditions

In this chapter, the experiments are conducted based on the HEVC reference soft-ware HM-3.0 and the HEVC common test conditions (JCTVC-E700 [14]). The HEVC common test conditions are desirable to configure experiments in a well-defined envi-ronment and ease the comparison of the outcome of experiments. JCTVC-E700 defines eight different test conditions, but only four of them are related to bi-directional inter-frame coding:

• Random access, high efficiency (RAHE).

• Random access, low complexity (RALC).

• Low delay, high efficiency (LDHE).

• Low delay, low complexity (LDLC).

Each test condition has a specific configuration with the ON/OFF of coding tools which are summarized in Tab. 4.1. Our proposed scheme are tested based on those test conditions in order to compare their BD-rate savings [6] with HM-3.0 anchor.

Chapter 4. Experiments

Encoder Configurations RAHE RALC LDHE LDLC

GOP Size 8 8 1 1

NumOfReference L0:2, L1:2 L0:4

Entropy Coder CABAC CAVLC CABAC CAVLC

Adaptive Loop Filter (ALF) Y N Y N

Internal Bit Depth (IBDI) 10 8 10 8

QP 22, 27, 32, 37

Sequences 1080p, 832 × 480, 416 × 240, 720p

CU Sizes 8 × 8 ∼ 64 × 64

Search Range ±64

Bi-Prediction Search Range ±4 Interpolation Filter 8-tap DCT-IF

Table 4.1: Common test conditions.

Rough estimations of complexity are performed by showing the encoding time ratio and decoding time ratio relative to HM-3.0 anchor.

4.1.2 TB-mode

For the configuration of our proposed template-based bi-prediction scheme (referred hereafter as TB-mode), we applied it only to 2N × 2N PUs. Three (AL, L, and A) or five (AL, L, A, AR, and BL) template shapes are fetched from the reconstructed frame with template width 4. For each 2N × 2N PU, one flag is set to switch adaptively between TB-mode and the usual inter mode. When the former is chosen, it codes at most two (three templates) or three (five templates) extra bits to specify the template shapes.

Moreover, in this chapter, we have two types of window functions to be evaluated on TB-mode. One is formed by the theoretical window functions that have been mentioned in Section 3.3 and the other is formed by a heuristic design of window functions. For theoretical window functions, the weighting coefficients are rounded offline into 16-bit integers. On the other hand, the weighting coefficients of the heuristic window functions are represented in 3-bit integers. To verify the performance of TB-mode, several experiments featuring different performance and complexity trade-offs are summarized in Tab. 4.2 and will be discussed in the following sections.

Chapter 4. Experiments

Table 4.2: Experimental settings of TB-mode.

Algo. Template Window

T3-C-UU 3 shapes Theoretical ±4 N/A 2 N

T5-C-UU 5 shapes Theoretical ±4 N/A 2 N

T5-S-UU 5 shapes Heuristic ±4 N/A 2 N

T5-S-UB 5 shapes Heuristic ±4 N/A 3 N

T5-S-BU 5 shapes Heuristic ±4 ±1 3 N

T5-S-BB 5 shapes Heuristic ±4 ±1 4 N

T3-S-F 3 shapes Heuristic ±4 ±1 4 Y

PU Size Pos w1,1 w1,2 w1,3 w1,4 w1,5 PU Size Pos w2,1 w2,2 w2,3 w2,4 w2,5

Table 4.3: The start position a and b for various 2Nx2N PU sizes.

4.2 Heuristic Window Functions

Each PU, when coded in the proposed scheme, has multiple window functions as de-noted by wn,m with n = 1, 2 and m = 1, .., 5. The parameter n is explicitly signaled in one extra flag, and the value of m is inferred according to the choice of the template shapes. The coefficient value of each wn,m takes values from either the set {0, 1, 4, 7}

or the set {0, 1, 4, 6}, and thus the multiplication by a floating-point number can be easily replaced by an integer arithmetic. Their waveforms illustrated in Fig. 4.1 form a partitioning of a PU into four non-overlapping regions, and each region corresponds to a specific coefficient. It should be noted that the zero numbers cover over half or three-forth region of a window function. Pixels in that region are not compensated by OBMC, which can effectively halve the expense of memory bandwidth.

To resize a window function according to the size of the considered PU, the start point a and the width b are recorded. Tab. 4.3 lists the values of a and b for every possible 2N × 2N PU size. In view of this resizing criterion, the storage requirements for weighting coefficients are thus conspicuously reduced.

Chapter 4. Experiments

wn,1(AL) w1,1 w2,1

wn,2(L) w1,2 w2,2

w_n,3(A) w1,3 w2,3

w_n,4(AR) w1,4 w2,4

Chapter 4. Experiments

Table 4.4: BD-rate savings and processing time ratios of TB-mode with 3- and 5-shape-adaptive configurations.

This section illustrates the compression performance and complexity of TB-mode by restricting the two MVs of TMP and the target PU to uni-directional predictions. Both theoretical and heuristic window functions as well as 3- or 5-shape-adaptive implemen-tations are evaluated.

4.3.1 Coding Efficiency versus Number of Templates

We first focus on the coding efficiency between 3- and 5-shape-adaptive implementa-tions. Tab. 4.4 presents the average BD-rate savings of T3-C-UU and T5-C-UU. The former experiment is the typical TB-mode with 3-shape-adaptive theoretical window functions, while the latter shows the result of 5-shape-adaptive TB-mode with the-oretical window functions. Clearly, T5-C-UU evalulates two additional templates in each 2N × 2N PU at the encoder, these additional RD-comparisons constantly delivers about 0.1% coding gains at the cost of increasing encoding complexity. Since the sizes of the two additional templates are small, the time ratio increment at the encoder side is only 14.3%. Moreover, 4% increase of decoding time is observed due to the two additional template searches performed at the decoder side.

Chapter 4. Experiments

Table 4.5: BD-rate savings and processing time ratios of TB-mode with theoretical and heuristic window functions.

4.3.2 Theoretical versus Heuristic Window Functions

Here we focus on the comparison between the design of theoretical and heuristic window functions, which are denoted by T5-C-UU and T5-S-UU in Tab. 4.5. Experimental results of T5-C-UU and T5-S-UU reveals that the additional set of heuristic window functions not only compensates the coding loss after the simplification of weighting coefficients, but also slightly increases 0.1% coding gains on average. For the encoding time increment, although each set of heuristic window functions reduces computation overhead more than a theoretical one, the extra set of RD-comparisons still brings about 24% increments on encoding time ratio. With regard to decoding complexity, since zero weighting coefficients reduces the computations for performing OBMC, the decoding time drops at about 10%.

4.4 Multiple Hypotheses

In this section, we discuss the effect of coding efficiency when multiple hypotheses are enabled for template and block motions. Experiments of multiple hypotheses are tested

Chapter 4. Experiments

Table 4.6: BD-rate savings and processing time ratios of enabling multiple-hypotheses.

Random Access RAHE RALC

Algo. T5-S-UU T5-S-UB T5-S-BU T5-S-BB T5-S-UU T5-S-UB T5-S-BU T5-S-BB

S03/S05/S06 −1.3 −1.4 −1.4 −1.9 −1.4 −1.7 −1.6 −2.2

Algo. T5-S-UU T5-S-UB T5-S-BU T5-S-BB T5-S-UU T5-S-UB T5-S-BU T5-S-BB

S03/S05/S06 −2.0 −2.0 −2.2 −2.3 −2.1 −2.3 −2.4 −2.6

• T5-S-UB: Experiment of enabling bi-prediction to block motions.

• T5-S-BU: Experiment of enabling bi-prediction to template motions.

• T5-S-BB: Experiment of enabling bi-prediction to both template and block mo-tions.

Averagely, T5-S-UB outperforms 0.2% in terms of BD-rate saving with 5% average decoding time increment. Enabling bi-prediction to the target PU for finding block motions almost has no effect on decoding time complexity. T5-S-BU reflects similar benefits to T5-S-BU with 0.3% BD-rate saving. Nevertheless, since the bi-prediction of TMP performs at the decoder side, the decoding time dramatically increases about 136% on average. On the other hand, it is interesting that the encoding time increment of T5-S-UB is 30% higher than T5-S-BU. The reason is that the size of templates and the range of template matching are generally smaller than the size of target PUs and the range of block motion search.

T5-S-BB enables bi-directional to both template and block motion search, which reaches an average BD-rate saving of 2.9% and a maximum BD-rate saving up to 5.2%. Although the performance of T5-S-BB are very impressive between all the configurations for TB-mode, the significantly increased encoding and decoding times make this scheme less practical. As a result, a further reduction in TB-mode complexity is necessary.

Chapter 4. Experiments

4.5 Fast Algorithm

As concluded in previous section, several enhancements of TB-mode are conducted to achieve the decreased time complexity and the moderate coding gains. To tackle the complexity issue of TB-mode, we start with the modification from T5-S-BB, which has promising coding gains and copious runtimes over all the TB-mode experiments. As summarized below, four major enhancements will be applied for speedup the runtimes of TB-mode:

• Reduce the number of template shapes moderately.

• Fast mode decision by skip TB-mode when SKIP mode has lowest RD cost among all the other modes.

• Limit the number of reference frames to be searched.

• Use bilinear filter for sub-pel interpolation during the TMP process.

4.5.1 Enhancements for Encoder Only

The major contribution of encoding time is the extra mode decision process of TB-mode R-D comparisons. As an additional prediction TB-mode, decreasing the number of TB-mode evaluations is one way to reduce the encoding time complexity. According to our observation, the area size of enabling SKIP mode adjusts slightly before and after TB-mode is applied. This observation implies that the encoder is unlikely to choose TB-mode when SKIP mode is the best candidate. As a result, if the best mode is SKIP, we bypass the TB-mode evaluations. Moreover, since the 3-shape-adaptive TB-mode drops neglectable coding gains than the 5-shape adaptive one, we reduce the number of template shapes to be tested for further reducing encoding time complexity.

4.5.2 Enhancements for Encoder and Decoder

TMP performs its motion search on both encoder and decoder sides, which has a great impact of TB-mode complexity. To diminish the motion cost caused by TMP, two approaches have been taken: The first is to reduce the number of reference frames

Chapter 4. Experiments

Table 4.7: BD-rate savings and processing time ratios of TB-mode after applying fast algorithms.

TB-mode. These reference frames are derived by referring to the reference indices used in the 2N × 2N MRG mode during the mode decision process. If there is only one available reference index or there are duplicated reference indices, an additional reference frame with the lowest QP in GOP structure is considered as another candidate to be evaluated in TB-mode.

In the latter issue, we revise the interpolation filter of TMP fractional-pel motion search by interpolating the reference PU with a bilinear filter. The bilinear filter brings a conspicuous complexity reductionl however, it also generates poor template motions resulting in a coding loss. Fortunately, this inefficiency can be partially compensated by other MVs (thus the block motions) in TB-mode.

4.5.3 Summary

Experiment T3-S-F describes the performance of TB-mode after applying those en-hancements introduced in this section. As in Tab. 4.7, T3-S-F has a moderate to significant average BD-rate saving of 2.2%, with a minimum of 1.1% and a maximum of 4.1% over all test cases. Although T3-S-F has an average coding loss of 0.7% com-pared with T5-S-BB, 131% for encoding and 237% for decoding time consumption are still impressive in reducing time complexity.

CHAPTER 5 Conclusion

In this thesis, we propose a bi-prediction scheme that combines predictors found by template and block motions with parametric OBMC window functions. Since the template motion is inferred on the decoder side, it requires only a motion cost as that of uni-directional prediction. For optimizing the motion parameters to be signaled, the motion search criterion is modified to reflect the interdependency between vb and vt. The choice of window function is based on the inferred MV constellation, which brings a better adaptation and prediction efficiency. Refer to the experimental results, a promising coding gain (2.9%) brings a cost of significant increase in both the encoding and decoding times. As a result, several modifications are made to strike a better balance between performance and complexity. After applying those modifications, the best scheme shows moderate-to-significant coding gains (2.2%) with reasonable complexity increments (46% and 33%). This result shows that it is possible to keep

Chapter 5. Conclusion

decoded MVs from neighboring PUs. In this manner, the need to perform TMP is waived at the cost of extra bits. We shall continue these investigations in our future work.

Bibliography

[1] K. Sugimoto and et al., “Inter Frame Coding with Template Matching Spatio-Temporal Prediction,” Proc. Int. Conf. Image Processing, 2004.

[2] Y. Suzuki and et al., “Inter Frame Coding with Template Matchin Averaging,”

Proc. Int. Conf. Image Processing, 2007.

[3] S. Kamp and et al., “Decoder Side Motion Vector Derivation for Inter Frame Video Coding,” Proc. Int. Conf. Image Processing, 2008.

[4] S. Nogaki and M. Ohta, “An overlapped block motion compensation for high quality motion picture coding,” Proc. IEEE Int. Symp. Circuits and Systems, pp. 184—187, May 1992.

[5] M. T. Orchard and G. J. Sullivan, “Overlapped Block Motioin Compensation:

An Estimation-Theoretic Approach,” IEEE Trans. on Image Processing, vol. 3, pp. 693—699, May 1994.

[6] G. Bjontegaard, “Improvements of the BD-PSNR Model,” ITU-T SG16 Q.6

Doc-BIBLIOGRAPHY

[8] M. Winken and et al., “Description of Video Coding Technology Proposal by Fraunhofer HHI,” JCTVC-A116, Apr. 2010.

[9] B. Tao and M. Orchard, “A Parametric Solution for Optimal Overlapped Block Motion Compensation,” IEEE Trans. on Image Processing, vol. 10, pp. 341—350, Mar. 2001.

[10] Y. W. Chen and W. H. Peng, “Parametric OBMC for Pixel-Adaptive Temporal Prediction on Irregular Motion Sampling Grids,” IEEE CSVT, 2011.

[11] W. Zheng and et al., “Analysis of Space-dependent Characteristics of Motion-compensated Frame Differences based on a Statistical Motion Distribution Model,”

IEEE Trans. on Image Processing, vol. 11, pp. 377—386, Mar. 2002.

[12] T.-W. Wang and et al., “Analysis of Template Matching Prediction and its Ap-plication to Parametric Overlapped Block Motion Compensation,” IEEE ISCAS, 2010.

[13] M. Karczewicz and et al., “Video Coding Technology Proposal by Qualcomm Inc.,” JCTVC-A121, Apr. 2010.

[14] F. Bossen, “Common Test Conditions and Software Reference Configurations,”

JCTVC-E700, Mar. 2011.

在文檔中結合樣板及區塊動作補償之雙動作向量預測方法 (頁 23-0)