Concept of Operations - Mode-Dependent Pixel-Based Weighted Intra Prediction (MPWIP) 15

3. Mode-Dependent Pixel-Based Weighted Intra Prediction (MPWIP) 15

3.1.1. Concept of Operations

The ideas of the proposed scheme are essentially a combination of the two prior works [6] and [7]. As depicted in Figure 6, the texture information of the EL intra predictor and BL reconstructed block are first decomposed into the DC and AC components, where the DC components are computed as a prediction block with all its pixels taking the average value of the input block, i.e., the EL intra predictor or the BL reconstructed block, and the AC components are formed from the residual signals produced by subtracting the DC components from their respective inputs. Each of these components is then weighted by a separate pixel-based weighting scheme and the results are summed together to form the final predictor.

17 3.1.2. Least-Squares Solution

Obviously, how different components are weighted has a crucial effect on the resulting prediction performance. We wish to find a set of weighting functions so that the resulting prediction residual can be minimized. This problem can be solved using the well-known Least-Squares (LS) method.

To ease the understanding of the subsequent discussion, we adopt the following notations: bold lower-case letters represent vectors, BOLD UPPER-CASE letters denote matrices, and italicized lower-case letters are scalars. Moreover, we use 𝐚_𝑘 = [𝑎_𝑘(1) 𝑎_𝑘(2) … 𝑎_𝑘(𝑛)] ^T and 𝐛_𝑘 = [𝑏_𝑘(1) 𝑏_𝑘(2) … 𝑏_𝑘(𝑛)] ^T to represent the predictor values at pixel k that are extracted from the EL intra predictors and the BL reconstructed blocks, respectively, in n collected data from the training process.

Similarly 𝐚𝐜_𝑘 = [𝑎𝑐_𝑘(1) 𝑎𝑐_𝑘(2) … 𝑎𝑐_𝑘(𝑛)] ^T and 𝐝𝐜_𝑘 = [𝑑𝑐_𝑘(1) 𝑑𝑐_𝑘(2) … 𝑑𝑐_𝑘(𝑛)] ^T denote respectively the corresponding values from the AC and DC components. Thus, we have vector whose elements are the weight values to associate with the four components for prediction at pixel k. Specifically, the weight vector represents the weighting scheme

from the perspective of a single pixel, describing how the corresponding samples from different components contribute to estimating a current pixel’s intensity.

With reference to the notations above, we further denote by 𝐨_𝑘 = [𝑜_𝑘(1) 𝑜_𝑘(2) … 𝑜_𝑘(𝑛)]^T the target pixels at position 𝑘 in 𝑛 collected blocks, whose intensity values are to be estimated. The problem of determining the optimal weight vector 𝐰^∗_𝑘 in the least-squares-error sense can then be formulated as follows:

From the Linear Algebra theorem, it has the closed-form solution

𝐰^∗_𝑘 = (𝐗_𝑘^T. 𝐗_𝑘)^−𝟏. 𝐗_𝑘^T. 𝐨_𝑘 (11) By varying the index k and repeating the same process, we can obtain the weight vectors for different pixel positions and thus the weighting functions for all four components.

3.1.3. Training Process

In order to collect the data for the basis functions to compute the optimal weighting function, the training process is introduced. First, to ensure the collected data is most appropriate, the proposed algorithm will be applied to produce a new prediction mode and this mode has to compete with all conventional modes in the rate-distortion optimization (RDO) process at the EL to find the best mode with lowest rate distortion (RD) cost. Then, those blocks coded in the proposed algorithm will be used to compute the optimal weighting functions with respect to Eq. (11).

It is observed that according to the Eq. (10), on each iteration of training process, the initial values of the weighting functions needs to be assigned. Specifically, in our training process the weight value corresponds to the average value of the EL and the

𝐰^∗_𝑘 = argmin

𝒘_𝑘 (𝐨_𝑘 − 𝐗_𝑘. 𝐰_𝑘)²

(10)

BL’s texture information which is utilized for the first iteration. Then, the updated weighting functions are referred to as the input weighting functions for the next iteration; the process is repeated for all the sequences in the training set. Finally, the criteria to terminate the training process are determined with respect to a general consensus is that if the weighting functions are optimized, the mean square error (MSE) of all the coded pixels in the current iteration should be the relatively smaller than that of other iterations. In particular, the training process is to be terminated according to two criteria: 1) The weighting functions are stable (that is, they do not vary considerably compared to previous iteration) and 2) The absolute difference in MSE value between the current iteration and its successive previous iteration is below 1% of MSE value of the successive previous iteration.

However, the resulting weighting functions are only optimized for a specific iteration of the training process. In addition, the sequences of the training set differ from the sequences of the test set (because it is important to test a model by data which is different from that used to develop it); therefore, the obtained weighting functions used to find the bit rate savings for the test sequences would be referred to as the weighting functions that are resulted from the training process.

3.2. Weighting Function

To gain a better understanding of how different components should be weighted in forming a better predictor, this section provides an in-depth analysis of the weighting functions with different components against 1) prediction mode taken by the EL, 2) QP setting of BL and EL, and 3) prediction block size.

Figure 7: Pixel coordinate system showing the pixel at (0,0)

For notation, the QP value of BL and EL is specified by a two-tuple representation QP(QPBL, QPEL), and the coordinate system shown in Figure 7 is used throughout the discussion that follows.

3.2.1. Effect of Intra Prediction Mode

This section investigates the effect of the intra prediction mode on the weighting function. Here the prediction mode refers to the intra prediction direction used to generate the EL predictor. Currently, our weighted scheme is restricted to the cases where the EL predictor is produced with Horizontal, Vertical, DC or Planar mode.

Figure 8 show the weighting functions for Vertical mode. It can be seen that those associated with the components from the same layer have a similar waveform, although their magnitudes differ considerably. Moreover, the weight value for the DC component of the EL is seen to be mostly lower than that of the BL, which justifies the IDCC algorithm’s substitution of the BL’s DC value for the EL’s. By comparing, Figure 9, Figure 10, and Figure 11, we can further observe that the weighting functions vary with the prediction mode with which the EL predictor is formed—i.e., they are mode dependent.

(0,0) Reference

Pixels

Target Block

Figure 8: Vertical mode, block size of 16x16, QP(30,30). Each figure corresponds to the weighting function of (a) ACEL, (b) DCEL, (c) ACBL, (d) DCBL

Figure 9: Horizontal mode, block size of 16x16, QP(30,30). Each figure corresponds to the weighting function of (a) ACEL, (b) DCEL, (c) ACBL, (d) DCBL

Figure 10: DC mode, block size of 16x16, QP(30,30). Each figure corresponds to the weighting function of (a) ACEL, (b) DCEL, (c) ACBL, (d) DCBL. The ACEL component of the DC mode is not available and it is not weighted in the experiments.

Figure 11: Planar mode, block size of 16x16, QP(30,30). Each figure corresponds to the weighting function of (a) ACEL, (b) DCEL, (c) ACBL, (d) DCBL.

Another interesting point to be noted in Figure 10 is that, when the EL is coded in DC mode, in which case the EL predictor contains no AC component, the BL’s texture information dominates the creation of the final predictor, but the contribution from the EL is not insignificant.

Finally shown in Figure 11 are the weighting functions that resulted from the Planar mode. We see that the EL predictor is weighted more heavily in forming a prediction of pixels in the top-left corner while the BL counterpart contributes more to the rest of pixels, especially those sitting in the bottom-right quadrant. This can be explained by the way that the EL predictor is constructed. Specifically, the Planar mode in HEVC produces the EL predictor using bilinear interpolation, with the reconstructed pixels at block boundaries serving as reference. While pixels on the top and to the left of a current block have been reconstructed previously, those at the bottom or to the right are yet to be determined. In the current implementation, their sample values are estimated by replicating that of the top-right and bottom-left pixels, respectively. The fact that the value of these pixels are weakly correlated with that of pixels in the bottom-right quadrant accounts for why the BL is assigned with a higher weight value there.

3.2.2. Effect of QP Setting

This section studies the effect of the QP setting (QP values assigned to the BL and the EL) on the weighting functions. Plotted in Figure 12 is the weighting function of the EL’s DC component along the slice of Y=10 for various QP settings (with the QPBL

ranging from 22 to 34, increasing by 4, and delta QP=0 or 2, delta QP is the difference in QP values between the QPEL and the QPBL); the results shown correspond to the

Figure 12: The curves of DC component in the EL of Horizontal mode on different QP settings in two sets of coding configurations. (a) QP=0: QPEL = QPBL, (b) QP=2: QPEL = QPBL +2

Horizontal mode. It is noted that the observations made in this case also applies to the other cases.

As expected, all these weighting functions decrease with the increasing X value because of the nature of Horizontal prediction and this is a result that we have seen before. Of more interest is the observation that the EL tends to be weighted more heavily when the EL is coded at better quality using a smaller QP value (note that given the same QPBL, a smaller delta QP leads to a smaller QPEL and vice versa) or the BL is coarsely quantized with a larger QP. This agrees with the general observation that when coded at better quality, the EL reference pixels correlate more strongly with the prediction pixels. The same argument is equally applicable to the case having a poorly coded BL. Somewhat to our surprise; the extent to which these weighting functions differ with each other is not that significant.

Another interesting observation from the weight values of the QP settings in the same set is that even though the difference between the smallest and the largest QP

(a) (b)

settings are considerable (with the value of 22 for the first and 34 for the latter case), the variance in the magnitude of those waveforms is insignificant.

The experimental results that showed the effect of the QP setting on weighting functions in terms of bit rate savings are provided in details in Chapter 4. With respect to these experimental results, the effect of the QP setting on the weighting functions in terms of bit rate savings is insignificant. Specifically, the experimental results showed that 1) the weighting functions for different QP settings within the same delta QP group can be unified by sharing a set of weighting functions of any QP setting; 2) the weighting functions for different QP settings between two delta QP groups can be unified by a constraint that the QP settings having the same QP value for the BL can share the same set of weighting functions. Therefore, these observations can lead to an opportunity to unify the weighting functions for the QP settings in the same delta QP set and/or all the QP settings in the common test conditions.

3.2.3. Effect of Prediction Block Size

This last investigation shows how the prediction block size affects the weighting functions. The effect of the prediction block size is somewhat expectable (the higher weight value for the EL’s components will be given to the smaller block size);

however, the texture information in the BL perhaps varies the expectation. Therefore, we need to consider the interaction of the BL texture information. As a result, the analysis on the effect of the prediction block size is necessary.

Figure 13: Waveforms of weighting functions for DC component in the EL of Vertical and Planar mode at different block size levels.

To this end, Figure 13 contrasts the weighting function of the EL’s DC component for 4x4 and 16x16 block sizes. Results are given for Vertical and Planar modes. As expected, the simulation results show that the EL predictor tends to have a higher weight value across the entire prediction block when the block size is smaller.

This is understandable given that directional intra prediction usually performs more efficiently with a smaller block size and that the EL reference pixels are subject to less coding error.

27 3.2.4. Summary

We can summarize our findings so far as follows:

 The weighting functions depend on the direction of the prediction mode taken by the EL intra predictor, i.e. they are mode dependent.

 The DC and AC components from the same layer have sets of weighting

functions that are similar in waveform but differ in magnitude, i.e. the separation of DC and AC components seems beneficial.

 The weighting functions of Horizontal and Vertical modes are related to each other mainly by a transpose operation.

 The effect of the QP setting on the weighting functions is insignificant in terms of bit rate savings in the common test conditions.

 The EL tends to be weighted more heavily in the case of smaller block sizes.

CHAPTER 4 Experimental Results

4.1. Test Conditions

Experimental results provided in this chapter are produced following mainly the All Intra (AI) common test conditions specified in [8]. In particular, only the results for mandatory tests, in which the base layer is HEVC coded, are presented. Moreover, for the 2-layer spatial scalability, the ratio of the EL’s resolution to that of BL is limited to 2x and 1.5x, although in principle any resolution ratios between the layers, including the ratio of 1, can be considered. Table 1 details the test sequences used. It is noteworthy that the weight values used in our pixel-based weighted intra prediction scheme are obtained based on a separate set of training sequences, as given in Table 2.

Generally, it is important to test a model against data which is outside of the samples used to develop it.

Table 1: Test set of video sequences

Class Sequence name Frame count Table 2: Training set of video sequences

Class Sequence name Frame count A SteamLocomotive 300 60fps 10 1280x800 2560x1600 Spatial 2x B Blue Sky 217 25fps 8 960x540

Table 3: Test set of quantization parameter values

Scalability QP of BL QP = QP of EL – QP of BL

Spatial 2x 22, 26, 30, 34 0, 2

Spatial 1.5x 22, 26, 30, 34 0, 2

Table 3 defines the quantization parameter values used for the I-frames in the base and enhancement layers of a sequence for the HEVC base layer case.

Figure 14: An example of rate-distortion curves for two sets of encoding configurations. Four QP values of BL in each set are 22, 26, 30, and 34.

4.2. Coding Performance

The coding performance of the proposed scheme is quantified by measuring the BD-rate savings [2] relative to the SHM 1.0 anchor. In particular, the results presented in Table 4 correspond to the average numbers taken over two sets of encoding configuration, as suggested by the common test conditions [8]. The method used to calculate the values reported in all performance tables will be described as follows.

First, the BD-rate calculation for the single layer coding will be used to calculate the BD-rate savings in each set of QP configurations (QP=0, QP=2) as depicted in Figure 14. The average numbers presented for every coding performance savings table are the average values of these BD-rate savings obtained from the aforementioned two sets.

From Table 4, the overall Y-BD-rate gain over the anchor is 1.0% for the AI-2x case and 0.5% for the AI-1.5x case. The coding gain achieved, however, is highly variable over the test sequences. As an example, the smallest gain is in the

(a)QP=0 (b)QP=2

Table 4: Performance of the MPWIP with respect to SHM 1.0

AI HEVC 2x AI HEVC 1.5x

Y U V Y U V

Class A Traffic -0.7% -0.6% -0.6%

PeopleOnStreet -0.6% -0.7% -0.4% N/A

Class B

Kimono -0.4% -0.4% -0.5% -0.2% -0.3% -0.3%

ParkScene -0.3% -0.3% -0.4% -0.1% -0.2% -0.3%

Cactus -1.0% -1.2% -1.3% -0.3% -0.6% -0.6%

BasketballDrive -2.4% -2.7% -2.2% -1.5% -1.6% -1.3%

BQTerrace -1.3% -1.6% -1.8% -0.4% -0.6% -0.7%

Overall (EL+BL) -1.0% -1.1% -1.0% -0.5% -0.7% -0.6%

Figure 15: The statistical mode distribution of two test sequences at the enhancement layer in SHM-1.0: (a) The ‘ParkScene’ sequence, (b) The ‘BasketballDrive’ sequence

Figure 16: The statistical mode distribution of two test sequences at the enhancement layer in the proposed design: (a) The ‘ParkScene’ sequence, (b) The ‘BasketballDrive’ sequence

(a) ParkScene (b) BasketballDrive

‘ParkScene’ sequence, with only a 0.3% and 0.1% gain for AI-2x and AI-1.5x,

respectively, whereas a much higher improvement is observed in the ‘BasketballDrive’

sequence, reaching up to 2.4% and 1.5%, respectively. The observation above can be explained by the statistical study on mode distribution and the characteristics of each sequence.

Conventionally, in the AI configuration of the common test conditions, at the EL, there are two prediction modes; Intra and Intra-BL. In the proposed design, the proposed algorithm is applied to create a new prediction mode, called MPWIP; this prediction mode has to compete with other conventional modes in the rate distortion optimization (RDO) process to find the best mode with the lowest rate distortion (RD) cost. Therefore, generally speaking, the more pixels coded in this new mode the higher the bit rate savings are expected to be achieved.

The statistical results will be analyzed in this section. Figure 15 shows the mode distribution diagram of the reference software (SHM-1.0) which includes only two conventional prediction modes. It can be seen that the percentage of intra-coded pixels in the ‘BasketballDrive’ sequence is considerably higher than that in the ‘ParkScene’

sequence. Specifically, it is observed that 26.54% more pixels are intra-coded in the

‘BasketballDrive’ sequence, while half as much of that percentage (i.e. 13.87%) is

found in the ‘ParkScene’ sequence. This can be explained by the characteristics of each sequence. The ‘ParkScene’ sequence is a highly-textured sequence while the

‘BasketballDrive’ sequence contains more homogeneous regions.

Figure 16 shows the mode distribution diagram of the proposed design. The results show that the percentage of pixels coded in the new mode of the ‘ParkScene’

Table 5: Performance of the IDCC with respect to SHM 1.0

AI HEVC 2x AI HEVC 1.5x

Table 6: Performance of the WIP with respect to SHM 1.0

AI HEVC 2x AI HEVC 1.5x

sequence. As a result, the higher bit rate savings can be achieved in the latter sequence.

In particular, for both test sequences, most of the pixels coded in the new mode resulted from those that were intra-coded or, alternatively, the proposed algorithm mostly improved the intra-frame prediction. Therefore, it can be concluded that the proposed algorithm works best in the sequences that contain more homogeneous regions.

For comparison with the IDCC and WIP algorithms, their respective BD-rate savings relative to the anchor are shown in Table 5 and Table 6. It can be seen that both schemes offer a much smaller BD-rate savings than our MPWIP. Specially, that of the IDCC ranges from 0.0% to 0.5% in all test cases, with an overall savings of no

more than 0.2%. The WIP, although performing relatively better in some sequences (such as ‘BasketballDrive’ and ‘BQTerrace’), shows a similar performance to the IDCC in terms of Y-BD-rate.

4.3. Simplification

The superior coding performance of our MPWIP comes at the cost of additional memory requirements in both the encoder and decoder for storing weight tables. In our current implementation, the luminance and chrominance components use separate weight tables for different block sizes (7 block sizes in total). Moreover, according to our analysis in Chapter 3, the weight tables for Horizontal, Vertical, Planar and DC modes are distinct from each other. The situation is further complicated by the QP setting, as we found the weighting functions also vary with the QP setting of the BL and the EL. As a result, a total of 840 weight tables are needed, which calls for a memory space of 795 kilobytes (with an assumption that a single-precision floating-point format occupies 32 bits [4 bytes]). Obviously, 795kB of required memory

在文檔中用於HEVC可調視訊編碼中估測模式相依之像素權重畫面內預測演算法 (頁 27-0)