A New Intra Prediction with Adaptive Template Matching through Finite State Machine

(1)

A New Intra Prediction with Adaptive Template

Matching through Finite State Machine

Chia-Hung Yeha_{, Shu-Jhen Fan Jiang}a_{, Chih-Yang Lin}b,c*_{, Pei-Lun Suei}d_{and Min-Kuan} C. Change

a_{Department of Electrical Engineering, National Sun Yat-sen University,} Kaohsiung 804, Taiwan

b_{Department of Computer Science and Information Engineering, Asia University,} Taichung 413, Taiwan

c_{Dept. of Medical Research, China Medical University Hospital, China Medical} University, Taichung, Taiwan

d_{Research Center for Information Technology Innovation, Academia Sinica,} Taipei 115, Taiwan

e_{Graduate Institute of Communication Engineering, National Chung Hsing University,} Taichung 402, Taiwan

Abstract—This paper presents a new approach that aims to improve the performance of the intra block coding of H.264/AVC and HEVC by using a finite state machine. Based on the high correlations between a frame’s neighboring blocks, the finite state machine is employed at both the encoder and decoder to reduce the number of bits required for intra encoding, improving the coding performance of videos. With the matching adaptive template, a better prediction block is found. Through the proposed extra intra prediction modes, the number of bits required to encode a block is reduced significantly, and thus a better intra coding performance is achieved. In addition, an early termination

(2)

is proposed to speed-up the coding performance. Experimental results show that with the proposed method, the bit rate can be reduced 11% on average when compared to H.264/AVC and 4% on average when compared to HEVC.

Keywords—Intra prediction, H.264/AVC, High Efficiency Video Coding, finite state machine, motion estimation, video coding

1. Introduction

Video coding is one of the most significant fields of multimedia research. In the last two decades, many high quality video coding techniques have been proposed. The High Efficiency Video Coding (HEVC) standard was developed by the Joint Collaborative Team on Video Coding (JCT-VC) in 2010, and was completed in 2013. The HEVC standard is very flexible; it can be used in a wide variety of applications over a diversity of networks and systems [1]. The objective of HEVC is to improve the video quality of previous standards. Some later improvements of HEVC, such as H.264/AVC, introduced new features such as coding tree unit, intra coding, and an in-loop deblocking filter, to improve the coding performance of videos. Among them, intra coding has one of the most important coding functions, as it prevents error propagation, leading to better video quality in many specific cases. One of the most important features of intra coding is intra prediction, an efficient method to compress intra-coded blocks.

The purpose of intra prediction coding is to find a predicted block, similar to the current block, to compress intra-coded blocks. With intra prediction, the bit usage of the coding residual becomes small. In the history of video coding, MPEG-1, MPEG-2, MPEG-4, H261, H.263, H.264/AVC, and HEVC, are coding standards that implement

(3)

the intra prediction method to reduce the residual data of videos. Among them, H.264/AVC and HEVC improve intra prediction in two ways, which are not provided in any previous video coding standards such as MPEG-1, MPEG-2, MPEG-4, H261, and H.263. One of most important improvements is that in these two video coding standards, intra prediction works in the spatial domain, while, in most previous standards, intra prediction work only in the transform domain. More specifically, H.264/AVC and HEVC are capable of predicting the linear block of the current block using the bounding pixels in the spatial domain. However, in MPEG-1, MPEG-2, and H261, only the DC (Direct Current) coefficient is predicted, and, in MPEG-4 and H.263, only the DC and partial AC (Alternative Current) coefficients are predicted. Another improvement of H.264/AVC and HEVC lies in the block size. The block size of intra prediction in previous video coding standards is a fixed value, e.g. 8×8, but H.264/AVC and HEVC have variable block sizes. H.264/AVC has variable block sizes, e.g. 16×16, 8×8, and 4×4, and the block sizes of HEVC’s intra prediction, called Prediction Unit (PU), are 64×64, 32×32, 16×16, 8×8, and 4×4.

Since many intra prediction modes have been adopted by H.264/AVC, the coding performance of H.264/AVC is superior to previous video coding standards. However, in intra coding, more bits are needed to encode the blocks when compared to inter coding. In one study [2], the bit usage of intra prediction mode information is 12% of the total intra coding bits. In order to reduce the bit usage result from the encoding of intra modes, Jia et al. utilized the neighboring blocks’ intra prediction mode and the border pixel smoothness to select the most probable mode (MPM) in intra prediction [2]. Zhang et al. proposed a context-adaptive coding scheme based on the Markov random field for encoding intra prediction mode information [3]. These approaches reduce the bit usage resulted from encoding the mode information of intra prediction. Another way

(4)

of reducing intra coding bit usage is to develop a more accurate intra prediction mode. Zhang et al. proposed three additional prediction modes to increase prediction precision [4]. Wang et al. proposed a weighted cross prediction method that replaces the DC mode [5]. Yu et al. proposed an intra prediction mode based on the motion estimation technique [6]. However, all of these methods still require many bits to transmit the coding table or the motion vector (MV). A template matching method is another technique applied to intra prediction. In [7], an “L”-shape template is used to find the best 2×2 prediction block. Tan et al. improves this method in [8] by applying a directional template and averaging predictors. Lan et al. uses the reconstructed region, both in the intra prediction and in the adaptive transform, to reduce the residual bits of a video [9]. Gu et al. applies different templates according to the intra prediction mode to further reduce the residual in the intra lossless coding [10]. Although some template matching methods do not require any additional bits [7][9], these methods have the problem of requiring a huge amount of computation because of the large number of pixels in a template. Hence, an adaptive template matching technique is applied in the proposed method by restricting the number of pixels in a template. As the method with the most representative pixels in a template, the number of pixels in a template is reduced. Thus, a huge computation is saved. Our proposed method applies a finite state machine to both the encoder and decoder for bitrate reduction. In our method, if the prediction block is constructed based on the prediction of the finite state machine, it is not required to transmit extra bits to indicate the position of the most similar block. Thus, the bitrate is reduced significantly. In this paper, we present a substantial theoretical analysis and experiment of the proposed intra prediction modes.

The rest of this paper is organized as follows: Section 2 introduces the intra prediction mechanisms of H.264/AVC and HEVC. Section 3 reviews the backgrounds of an intra

(5)

prediction method based on the motion estimation technique, and the finite state machine in vector quantization (VQ) coding. Section 4 describes the proposed method, which is based on a finite state machine. Experimental results are presented in Section 5 to demonstrate the effectiveness of the proposed method, while also presenting a comparison between our method and existing methods. Concluding remarks and a discussion of future work are given in Section 6.

2. The intra prediction of video coding standards

2.1. H.264/AVC intra prediction

Intra prediction assumes that correlations exist between neighboring blocks. Pixels from the upper and the left of the neighboring encoded blocks are first extrapolated to construct a H.264/AVC prediction block. An intra-coded macroblock (MB) in H.264/AVC with a size of 16×16 is divided into three types: one 16×16 coded block, four 8×8 coded blocks, and sixteen 4×4 coded blocks, which are called Intra16×16, Intra8×8, and Intra4×4, respectively. Four prediction modes are considered in Intra16×16 (Fig. 1(a)), and both Intra8×8 and Intra4×4 have nine prediction modes used to predict the current block (Fig. 1(b)).

To evaluate which mode is the best, rate-distortion optimization (RDO) through a Lagrange multiplier is used for comparisons. Therefore, the best prediction of a MB for the block currently being encoded is selected from Intra16×16, Intra8×8, and Intra4×4 according to their Lagrange costs. The major advantage of intra prediction is that it is simple and efficient when constructing homogeneous blocks in a frame; however, some blocks’ contents with finer directions or more complex textures cannot be compressed efficiently in the line-based prediction in the intra coding. Moreover, intra coding needs more bits to encode the residual of the current block when compared to inter coding. In order to enhance the coding performance of intra prediction, we use the finite state

(6)

machine technique to predict the current block from the frame itself. 2.2. HEVC intra prediction

In HEVC, a frame is divided into Largest Coding Units (LCUs) with size 64×64 instead of a MB defined by H.264/AVC. Each LCU is recursively partitioned into four Coding Units (CUs), and forms a quad tree structure. The block size of CU varies between 64×64 and 8×8. Two independent partition structures are defined within a CU, Prediction Unit (PU) and Transform Unit (TU). PUs control the block size of each prediction block, and TUs control the block size of the integer transform. In intra coding, one 2N×2N CU is divided into one 2N×2N PU or four N×N PUs. And each CU contains one or more TU(s) with size(s) ranging from 4×4 to 32×32. Figure 2 illustrates the relationship between CUs, PUs, and TUs in an intra-coded LCU.

Compared to H.264/AVC, which has nine prediction modes to predict the current block, HEVC extends the prediction modes to 33 directions and two non-directional modes as shown in Fig. 3, increasing prediction precision. The various block sizes and prediction modes results in a huge computation in the selections of both the block sizes and the prediction modes. To reduce the computation, the Hadamard transform absolute difference (HAD) calculates the distortion between the prediction block and the current coded block in each 2N×2N PU prediction mode. Then, the prediction modes with the minimal HAD is selected as candidates, and the best prediction mode from these candidates is obtained according to the minimal rate-distortion (RD) cost. This process will be also applied to the mode of four N×N PUs. Finally, the best PU is selected from one 2N×2N PU or four N×N PUs by using the RD cost.

3. Background review

3.1. Intra prediction using intra-macroblock motion compensation

(7)

intra prediction methods. The intra prediction method proposed by Yu et al., compresses a frame efficiently through the motion estimation technique [6]. In video coding, motion estimation improves inter frame coding significantly when predictions are made based on the reference frame, not just the frame itself [11]. Motion estimation can successfully remove temporal redundancy and has been adopted in many well-known video coding standards. Based on these previous research results of the temporal domain, Yu et al. achieved a better intra prediction through motion estimation of the spatial domain. For an input block in intra prediction, Yu et al. looks for the most similar block from previously coded data in the current coded frame. Instead of using directional intra prediction as before, Yu et al. encodes a block as a position vector to indicate the relative position of the most similar block to the current MB. Through this process, the number of bits required to encode the block is significantly reduced and a high coding efficiency is achieved. Figure 4 illustrates the basic idea of Yu et al.’s work. Here, block C represents the current block ready to be encoded. The non-shaded region ( Rf ) represents data not coded. The reconstructed area Rc is shaded in gray and represents the previously encoded data; all pixels within this area are available to the decoder side for intra prediction. In order to reduce computation, the search window is set at m× m ; otherwise the whole area marked Rc would need to be searched. However, in Yu et al.’s method, 2×(log2m+1) bits are still required to transmit the motion vector (MV). Assume that (i , j) and (u , v ) represent the positions of the current block and the best matching block, respectively. The MV, (u−i , v − j) , are required to be encoded and transmitted to the decoder, deteriorating the compression rate. Inspired by the work of Yu et al.’s method, we propose an intra prediction method that provides the same prediction ability as that of Yu et al.. Our

(8)

method, however, requires less overhead in the additional bit information than that of Yu et al.’s work. For a more convenient description, we refer to a block encoded through Yu et al.’s method as an “IntraMC” block.

3.2. Finite state machine in vector quantization coding

Vector quantization (VQ) achieves better performance than transform coding under the same compression ratio in the theory of image compression. Foster et al. proposed the finite state vector quantization (FSVQ) scheme to improve traditional memory-less VQ techniques [12]. FSVQ enhances the quality of coded images and reduces bitrate by using the previously encoded blocks to make a selection from a family of codebooks called the “state codebook.” Nasrabadi et al. proposed the dynamic FSVQ (DFSVQ) scheme to solve problems associated with FSVQ [13], such as duplications of the code vectors and memory overhead. The DFSVQ scheme is shown in Fig. 5.

The DFSVQ scheme employs the finite state machine in VQ, which is widely used in image coding and image processing. At the beginning of DFSVQ, the encoder is in a

state S with an initial dynamic codebook, where n denotes the block index. Then, then input vector x is encoded through searching for the best representative code vector inn the state codebook, and the corresponding index is sent to the decoder. The state S isn later determined by the next-state function, utilizing the neighboring encoded blocks to select the state codebook from the original codebook. Therefore, the DFSVQ encoder

can be viewed as mapping function , which converts the input vector x into an

channel index I defined asn

( , ).

n n n

(9)

The encoder state S in DFSVQ is determined by a next-state function n f(.) given by ( , , , ),

n

S  f I I I I_ _ _ _ (2)

where , , ,I I I _{and I}_{, are the information of neighboring encoded blocks.}

If the next-state function can successfully predict the behavior of the next input vector by (2), only the state codebook needs to be searched in the encoding and decoding processes. The DFSVQ decoder performs the following mapping

( , ),

n n n

x  I S (3)

where x is the reproduction code vector and  is the decoder mapping function. Then DFSVQ decoder selects the same state codebook as the encoder through the next-state function, therefore, the bitrate of DFSVQ coding can be reduced significantly when compared to previous VQ methods. Among the algorithms proposed to design the next-state function, the side-match vector quantization (SMVQ) is one of the most representative schemes [12]-[13] for prediction. SMVQ uses the upper and left border pixels in the previously encoded data as the side-match pattern to form the next-state function to select the state codebook [13]. In other words, the SMVQ scheme selects the next-state function by forcing spatial continuity across block borders. SMVQ assumes that border pixels are very similar and highly correlated.

VQ provides high compression ratio because VQ only uses indices of codewords to represent original blocks. Due to the fact that both the encoder and decoder are needed to hold the same codebook and that the codebook also needs to be compressed by other image compression tools, VQ has not been adopted in current image/video standards. Our method, however, proposes a method that applies the finite state machine of VQ to intra prediction to save coding bits efficiently without decreasing video quality. The proposed method constructs the state codebook from the previous coded data and reconstructs the current block by the same state codebook. Since there is no motion

(10)

vector information required, the number of bits for each encoded block can be significantly reduced.

4. Adaptive template matching-based intra prediction

4.1. Observation

Intra mode has many advantages, including error propagation prevention and random access. However, more bits are needed to encode an intra-coded block when compared to an inter-coded block. Yu et al.’s intra prediction method uses the motion estimation technique to improve the coding performance of videos. The reconstructed region of the frame is reused for prediction. The benefit of using the reconstructed data is indicated in Fig. 6, which shows a comparison of the average bits usage of the Intra4×4 mode through H.264/AVC and Yu et al.’s work. The x-axis in Fig. 6 represents the number of IntraMC-coded blocks in one MB using Yu et al.’s method. As a comparison, the same block is also encoded using H.264/AVC. Four sequences, Akiyo, Foreman, Ice, and Weather, of the CIF (Common Intermediate Format) format are tested. As more IntraMC-coded blocks are used in Yu et al.’s work, more bits can be saved in the intra encoding process when compared to H.264/AVC. However, as the number of IntraMC-coded blocks usage increases, more bits are required to transmit the mode information, e.g. MV. Our study is motivated by the strong demand for bit reduction and the desire to further enhance the intra coding performance.

4.2. Adaptive template selection

Diverse from Yu et al.’s work which requires many additional bits for transmitting the MV, template matching method in [8] only needs a few bits to represent which template is selected for the current coded block. Different from [8], [7] and [9] use only one template and require no additional bits. However, the prediction error of using one template is larger than that of using two or more templates. The goal of the proposed

(11)

adaptive template selection at this step, hence, is to provide a more representative template for each coded block without using any bit to indicate which template is used in current coded block.

In the area of image processing, the gradient is an important feature to understand the image structure. Thus, it has been widely used in the field of image/video processing, e.g. spatial error concealment [14] and image inpainting [15]. Here, in this paper, the proposed method makes use of the gradient and selects the pixels with larger gradient value from the candidates to construct a template. The gradient used in this paper is calculated by the Laplacian operator because of its lightweight computations. A

candidate should fall within 2L pixels to the current coded block according to the

Manhattan distance, where L_{is the length of the block. The initial candidates are}

defined as the direct neighbors of the current coded block as shown in Fig. 7(a) marked in blue. If the pixel with the largest gradient value belongs to the initial candidates, the direct neighbors of this pixel are also selected in the template, as shown in Fig. 7(b) and Fig. 7(e). Fig. 7(c) shows that the selected pixel in the candidate pool has the largest gradient value but it does not belong to the initial candidates, so only this pixel is included in the template. Considering the computation and the coding performance, the

whole process stops when the number of selected pixels is larger than 2L. Figure 7 shows the steps of the proposed adaptive template selection where the pixels marked in blue is the candidate, and the pixels marked in pink constitute the template for the encoding block. Figure 8 shows the flowchart of proposed template construction.

4.3. Adaptive template matching based intra prediction

Yu et al.’s work finds the best prediction block using the motion estimation

(12)

m is the search range, 1 is the sign bit, and 2 represents the horizontal and vertical motion vectors). In our method, we aimed to reduce this extra bit usage needed in Yu et al.’s method through a finite state machine. In the proposed finite state machine, the original codebook (super codebook), BORI _{, the blocks in the reconstructed area, for each} coded block is dynamically changed. Each coded block, denoted as a vector

2

1 2

( , , , )T L

CB cb cb _ cb

, uses its upper and left border pixels in the previously encoded data as a match template to search for the N blocks with the smallest template matching

distortion. These N blocks comprise of the state codebook BSTATE_{. The template of the}

coded block is represented by ( ,1 2, , 2 ) T L

CT  ct ct _ ct _{, the prediction block is denoted as}

a vector ( ₁, ₂, , 2)

T L

SB sb sb _ sb

, and its corresponding searching template is

represented by ( ,1 2, , 2 ) T L

ST  st st _ st _{. Figure 9 illustrates the proposed state codebook}

generation method and shows the pixel representation of CB CT ST, , , and SB. , , ,

i i i

cb ct sb and st are the pixels in i CB CT ST, , ,_andSB._{The state codebook is} expressed as 2 ₂ 1 { | , ( ) } ( , ) STATE i i L i i i j ij j B SB i N Rank TMD N TMD d CT ST CT ST ct st         _ _ _ _ _ 



(4)

where Rank(.) is the ascending sort function, i and j are the two indices of the super code vector, and N is the total number of the state codebook. The final state codebook is

expressed as BSTATE { ,B B1 2, , BN}.

(13)

matching method adopts two options to select one code vector from the state codebooks as the prediction block. The two options are called IntraATM and IntraSTATE modes.

The IntraATM mode selects the first codeword in BSTATE defined in Eq. 5 as the prediction block, which has the smallest template matching distortion in the state

codebook BSTATE._{Since the decoder is able to reconstruct the prediction block using the} same information as in the encoder, no additional intra bit is needed for block encoded using the IntraATM mode.

1. ATM

PB B (5)

In our method, the block selected by the IntraATM mode has the minimum template matching distortion. However, it cannot be guaranteed that the internal part of the selected block is similar to the current block. Therefore, we created another mode called IntraSTATE to improve the matching result. We compare the current block, CB, with these state code vectors, and the code vector with the minimum distortion is selected as

the prediction block. The distortion, STATED , between the state code vector k B andk the current block CB is shown in Eq. 6, and the prediction block of IntraSTATE mode,

STATE PB _{, is defined by Eq. 7.} 2 1 ( , ) M , k k k l kl l STATED d CB B CB B ctb b     



 (6) min 1

, min arg min ,

STATE k

k N

PB B STATED

 

  ₍₇₎

where k is the index of state code vector and l represents the position. Extra log N bits2 are required to represent the index of the prediction block in the IntraSTATE mode. The following is the pseudo code of the state codebook generation and the prediction blocks selection of IntraATM and IntraSTATE modes.

(14)

Compute TMD i[ ]

IF TMD i[ ]Sort TMD N_ [ ] Sort TMD N_ [ ]TMD i[ ] B N[  1] SB i[ ]

Rank B_andSort TMD_ _{according to}Sort TMD_

END IF

END FOR

[1] ATM

PB B

FOR each state code vector B k[ ] in the state codebook B Compute STATED k[ ]

IF STATED k[ ]Min STATED_ Min STATED STATED k_  [ ] PBSTATE B k[ ]

END IF

END FOR

Using the finite state machine, there is a high probability that the two modes proposed here will select the same block as IntraMC. However, the number of bits used for the two proposed modes is less than IntraMC. Therefore, the proposed method achieves better coding performance when compared to both Yu et al.’s work and H.264/AVC intra prediction.

4.4. The early termination of the proposed HEVC

Although the proposed adaptive template matching method can be fully applied to HEVC, the method applied to HEVC is modified to reduce its overhead in the computational complexity and additional mode information. Under the consideration of computation and efficiency, the proposed prediction modes are skipped when the PU size is larger than 8×8, because the neighboring information outside 8×8 is not effective for predicting the current block. In the proposed H.264/AVC method, quarter pixel accuracy is used in the IntraMC mode. An additional 4 bits are needed to represent the MV (2 bits for each horizontal/vertical MV). In HEVC, only full-pixel accuracy is adopted to decrease bit overhead. To reduce the computation, two strategies are

(15)

proposed: adaptive template prediction mode decision and fast intra-coded block motion estimation. In the adaptive template prediction mode decision, the proposed adaptive template matching modes are skipped via using the information of IntraMC. In the fast intra-coded block motion estimation, the search range of IntraMC is reduced without much coding performance degradation.

4.4.1. Adaptive template prediction mode decision

In the proposed method, IntraMC mode is checked before the selection of the proposed IntraATM and IntraSTATE modes. Thus, the information derived from the IntraMC mode can be used to determine whether IntraATM and IntraSTATE modes should be skipped, which is defined in Eq. (8). The HAD value in Eq. (8) between the

prediction block in IntraMC and the current block (HADIntraMC_{) is used to predict the}

RD cost of IntraATM (RDCostPred).

( ),

Pred IntraMC IntraMC IntraMC

RDCost HAD  R Bit (8)

where RIntraMC_andBitIntraMC_{are the rate and the additional mode bit in IntraMC,}

respectively. If RDCostPred_{is larger than the minimal RD cost derived from HEVC for} the current PU, the IntraATM and IntraSTATE are skipped. In order to verify the viewpoint of Eq. (8), Tables 1 and 2 show that over 99% of PU do not choose IntraATM or IntraSTATE modes when the current minimal RD cost is used to compare

with the RDCostPred_{. Due to the statistical results, it is reasonable to use the}RDCostPred to determine whether IntraATM and IntraSTATE should be skipped or not.

4.4.2. Fast intra-coded block motion estimation

In HEVC, the prediction mode with the same PU size is checked more than once for different CU and TU sizes. Moreover, the whole MV information is required to be

(16)

transmitted in the IntraMC mode. The computational complexity in this case can be greatly improved by reducing the search range of later motion estimation. In this proposed scheme, the range of the first motion estimation is set to 32 pixels, but the range of the second motion estimation is set to only 10 pixels. Figure 10 shows that over 90% of the second prediction block can be found from the location of the first prediction block at a distance within 10 pixels.

4.5. Algorithm summary

4.5.1. The proposed H.264/AVC algorithm summary

We have also added another step to integrate the IntraMC mode into our proposed method in case the two proposed modes cannot find a good prediction block. The search range of the three modes (marked W in Fig. 9) are set at 32 pixels on the top, left, and right of the current coding block. Also, the three modes are implemented with quarter-pixel accuracy. The interpolation method used for the quarter-quarter-pixel accuracy is 6-tap. There are a total of 12 intra prediction modes (nine H.264/AVC default intra modes, along with IntraMC, IntraATM, and IntraSTATE) in the proposed method, and the best mode is chosen by the RDO, which is defined as

min{ ( , , |J C r P QP,_mode)}, (9) ( , | ) _mode ( , , | ),

J D C r QP   C r P QP (10)

where J is the Lagrangian cost function, C denotes the original block, r is the prediction block, and P is set at 0 if the current mode is equal to MPM. Otherwise, P is set at 1,

QP is the quantization parameter, mode_{is the Lagrange multiplier, D is a distortion} measurement, and R represents the number of bits required to encode the current mode. The MPM chooses the smaller number of prediction mode from either its upper MB mode, or left MB mode, to reduce the bit usage in the prediction modes.

(17)

extending the code vector tables from the original tables in Intra8×8 and Intra4×4. The number of coded bits ranges from 1 or 5 in Intra8×8 and Intra4×4. We also compared these additional bits with H.264/AVC and other methods.

The encoding procedure of the proposed intra prediction method is described below and shown in Fig. 11:

1) In the Intra16×16 mode, an MB chooses the prediction mode with the minimum

Lagrange cost, Jmin_16 16 _{, from four conventional intra prediction modes.}

2) In the Intra8×8 mode, each 8×8 sub-block in a MB chooses the prediction mode with the minimum Lagrange cost from the nine H.264/AVC intra prediction modes, IntraMC, IntraATM, and IntraSTATE. The Lagrange cost of the Intra8×8

mode, Jmin ⁡8×8 , is the sum of the Lagrange costs of four 8×8 sub-blocks.

3) In the Intra4×4 mode, each 4×4 sub-block in one MB chooses the prediction mode with the minimum Lagrange cost from the nine H.264/AVC intra prediction modes, and IntraMC, IntraATM, and IntraSTATE. The Lagrange cost of the

Intra4×4 mode, Jmin ⁡4× 4 , is the sum of the minimum Lagrange costs of sixteen

4×4 sub-blocks.

4) The best chosen mode is the one with the lowest Lagrange cost fromJmin_16 16 _,

min_ 8 8

J  _{, and}Jmin_ 4 4 _.

In IntraMC, when the search window is 32, each MB requires an extra 16 bits to be encoded and the quarter-pixel accuracy is applied. The proposed IntraATM mode requires no extra bits to be encoded, while the proposed IntraSTATE mode requires four bits to be encoded if N is 16.

(18)

The search range settings of IntraMC, IntraATM, and IntraSTATE, are the same as that of H.264/AVC. Only full-pixel accuracy is adopted in IntraMC, IntraATM, and IntraSTATE, to decrease the bit overhead and the computation of video coder. Thus, an extra 12 bits are required for each IntraMC block. The additional mode information of IntraATM and IntraSTATE are the same as that of H.264/AVC. The IntraMC, IntraATM, and IntraSTATE are predicted when the PU size is lower or equal to 8×8. Thus the encoding procedure of the proposed method with CU size larger than 8×8 is identical to that of HEVC.

1) In 8×8 CU, 8×8 PU is performed first. The search range of IntraMC is set at 32 in

this step. If the RDCostPred_{calculated from (8) is larger than the current minimal} RD cost, IntraATM and IntraSTATE are skipped.

2) The rough mode decision selects eight prediction modes of minimal RD cost value from 38 prediction modes.

3) The search range of IntraMC is set at 10 and the most similar block found in Step 1 is set at the center of the motion estimation. The intra prediction mode decision after the integer transform selects the best intra prediction mode from the eight prediction modes in 8×8 PU.

4) Then, the PU is divided into four 4×4 PUs. The search range of IntraMC is set at

32

 _{. If the}RDCostPred_{calculated from (8) is larger than current minimal RD cost,}

IntraATM and IntraSTATE are skipped.

5) The search range of IntraMC is set at 10. The rough mode decision selects eight prediction modes with minimal RD cost value from 38 prediction modes. And, the intra prediction mode decision after the integer transform selects the best intra prediction mode from the eight prediction modes in 4×4 PU.

(19)

6) The best chosen mode of a 8×8 CU is the one with the lowest RD cost between one 8×8 PU or four 4×4 PUs.

5. Experimental results

5.1. Experimental results of H.264/AVC

In this paper, we compare the coding performance of H.264/AVC intra prediction, WCP [5], Yu et al.’s work [6], Lan et al.’s work [9], our previous work [16], and the

proposed method in terms of BDBR (Bjontegaard Delta Bit Rate) and BDPSNR

(Bjontegaard Delta PSNR) [17]. The search window in each works is set at 32 pixels. Our previous work [16] performs an “L”-shaped template, and three proposed intra prediction modes are all used. BDBR represents the average bitrate increases in percentage in the whole range of PSNR, while BDPSNR represents PSNR increase in dB in the whole range of bitrates. All methods are implemented in the H.264/AVC JM 18.4 reference software, including all block types. Two kinds of video sequence resolutions, CIF and 4CIF, are used in our simulation. Nine CIF format test sequences and three 4CIF format test sequences that contain different kinds of textures are used for comparison. Other detailed simulation settings are shown in Table 3.

5.1.1. BDBR and BDPSNR comparison

Tables 4 and 5 show comparisons of the BDBR and BDPSNR performance of the H.264/AVC intra mode, WCP [5], Yu et al.’s work [6], Lan et al.’s work [9], our

previous work [16], and our proposed method, respectively. The proposed method achieves an average bitrate saving of 5% to 18%, or an average improvement of 0.25 dB to 1.57 dB PSNR when compared to H.264/AVC. Moreover, the proposed method outperforms WCP [5], Yu et al.’s, Lan et al.’s, and our previous work by 10.68%,

4.90%, 7.85%, and 3.41% in BDBR saving, and 0.60 dB, 0.27 dB, 0.46 dB, and 0.26 dB in BDPSNR increases, respectively. Figure 12 shows the rate-distortion curves of the

(20)

six methods of the two test sequences. Based on these results, the proposed method clearly outperforms the coding performance when compared to H.264/AVC intra prediction, WCP, Yu et al.’s method, Lan et al.’s, and our previous method.

The coding performance comparison in different numbers of pixels in a template and in different sizes of state codebooks in the proposed method is shown in Figs. 13 and 14, respectively. The BDBR is decreased linearly and the computation is increased when the number of pixels in a template becomes larger and when the size of state codebook gets larger. Considering the coding performance and the computational

complexity, we have fixed the number of selected pixels in a template to 2L and set the number of used state codebook to 16. Figure 15 shows the comparison of the proposed method with our previous work [16] by SSE (sum of square error) between the code vector and the current block. The figure illustrates that the SSE of the proposed adaptive template selection method is smaller than that of our previous work. The results demonstrate that using pixels with larger gradient values as the template can predict the current block more accurately than using the fixed L-shaped template applied in our previous method. Further experimental results presented in Tables 4 and 5 show that the proposed method outperforms our previous work by saving 3.41% in BDBR, and increasing 0.26 dB in BDPSNR, on average. Figure 16 shows the mode usage comparison of Yu et al.’s method, Lan et al.’s method, our previous method, and the

proposed method, where the intra prediction mode proposed by Lan et al.’s method is marked as IntraTM. The usage of the IntraMC mode in Yu et al.’s work does not exceed 12% in all test video sequences. In IntraMC, each MB requires an extra 16 bits to be encoded when quarter-pixel accuracy is applied. In Yu et al.’s work, if the block searched by the IntraMC mode is different from the current block, it cannot improve the nine H.264/AVC intra modes and is thus unlikely to be selected by RDO. However, in

(21)

our method, the proposed IntraATM technique requires no extra bits and the IntraSTATE only requires four bits when the number of candidate blocks is 16. The two proposed modes (IntraATM and IntraSTATE) have a high probability of finding the same block as the IntraMC mode, and fewer bits are needed to be encoded. Therefore, the usage of these three modes (IntraMC, IntraATM, and IntraSTATE) exceeds over 20% when compared to the 12% of Yu et al.’s method. As a result, the proposed method has a better coding performance when compared to Yu et al.’s method.

5.1.2. Computational Complexity Comparison

Table 6 shows the increase time ratio of Yu et al.’s work, Lan et al.’s work, and the proposed method compared to H.264/AVC. The increase time ratio is defined in the following: TargetAlgo Standard Standard T T T Time    (11)

where TStandatd_andTTargetAlgo_{represent the encoding time of the standard, H.264/AVC,} and the target method, respectively. The average increasing time ratio in Yu et al.’s method, Lan et al.’s work, and the proposed method is 1.89, 69.97, and 8.10, respectively. In Lan et al.’s work, each of the 33 pixels of the current 4×4 blocks are compared with the reconstructed region. The results show that our method only needs eight to 10 pixels in Intra4×4, much less in computation time than that of Lan et al.’s method.

5.2. Experimental results of HEVC

To evaluate the coding performance of the proposed algorithm, we compare our method with HEVC in terms of BDBR and BDPSNR. Our proposed method and HEVC are implemented in the HM 10.0 reference software. Five video sequence resolutions, marked class from A to E, are used in our simulation. Other detailed simulation settings are as shown in Table 7.

(22)

Table 8 shows the comparison of the BDBR and BDPSNR performance of the HEVC intra prediction modes and our proposed method with and without early termination, respectively. The proposed method without early termination achieves an average bitrate saving of 4.3%, or an average improvement of 0.22 dB PSNR when compared to HEVC. With early termination, the proposed method has 2.83% bitrate saving or 0.14 dB PSNR improvement. Table 9 shows the coding performance comparison of the proposed method applied in different PU sizes. As shown in Table 9, the bitrate is saved from 5.9% to 9.7% when the proposed method is applied to a PU size of less than 32×32. However, more encoding time is required for PU size <= 32×32; the coding time is increased from 3.7 to 6.5. With the decrease of acceptable coding performance, the increased coding time decreases from 3.5 to 1.6. Figure 17 shows the rate-distortion curves of the three methods of the BasketballDrill and BQSquare test sequences. BasketballDrill and BQSquare are one of the test sequences in class B and C, respectively. Based on these results, the proposed method clearly achieves better coding performance when compared to HEVC.

6. Conclusion

A new intra prediction scheme is proposed to enhance the coding performance of intra prediction. The proposed method creates two new intra prediction modes through the concept of a finite state machine, using the adaptive template match criteria to predict the next-state function at the decoder end. Blocks coded by the proposed modes require fewer bits for encoding, and, thus, the coding performance of intra coding is improved. If the proposed modes here are unable to predict the behavior of the current block satisfactory, the IntraMC mode is applied to prevent performance degradation. H.264/AVC and HEVC intra prediction modes are also used and the optimal mode is

(23)

selected through RDO determination. Experimental results show that the proposed method outperforms H.264/AVC by 10.980% and HEVC by 4.301% bitrate saving.

Acknowledgments

This work was supported in part by the National Science Council under the Grants NSC101-2221-E-110-093-MY2 and NSC102-2221-E-110-032-MY3. Our thanks to Chia-Shiu Wu for executing the program on the test data in our preliminary work; his timely assistance is greatly appreciated.

References

[1] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, A. Luthra, Overview of the H.264/AVC video coding standard, IEEE Trans. Circuits Syst. Video Technol. 13 (7) (2003) 560-576.

[2] J. Jia, D. Yoon, H. K Kim, An efficient coding of intra picture prediction modes for H.264/AVC, Proc. Third Int. Conf. Multimed. Ubiquitous Eng (2009) 49-53. [3] K. Zhang, X. Ji, Q. Huang, D. Zhao, W. Gao, An efficient coding method for intra

prediction mode information, Proc. IEEE Int. Symp. Circuits Syst. (2009) 2814-2817.

[4] P. Zhang, D. Zhao, S. Ma, Y. Lu, W. Gao, Multiple modes intra-prediction in intra coding, Proc. IEEE Int. Conf. Multimed. Expo 1 (2004) 419-422.

[5] L. Wang, L.-M. Po, Y.M.S. Uddin, K.-M. Wong, S. Li, A novel weighted cross prediction for H.264 intra coding, Proc. IEEE Int. Conf. Multimed. Expo (2009) 165-168.

[6] S. Yu, C. Chrtsafis, New intra prediction using intra-macroblock motion compensation, JVT-C151 (2002).

[7] T. K. Tan, C. S. Boon, Y. Suzuki, Intra prediction by template matching, Proc. IEEE Int. Conf. Image Process. (2006) 1693-1696.

[8] T. K. Tan, C. S. Boon, Y. Suzuki, Intra prediction by averaged template matching predictors, in Proc. 4th IEEE Consum. Commun. Netw. Conf. (2007) 405-409. [9] C. Lan, J. Xu, F. Wu, G. Shi, Intra frame coding with template matching prediction

(24)

[10] Z. Gu, W. Lin, B.-S. Lee, C. T. Lau, and M.-T. Sun, Mode dependent templates and scan order for H.264/AVC based intra lossless coding, IEEE Trans. Image Process. 21 (9) (2012) 4106-4116.

[11] C. J. Kuo, C.-H. Yeh, S. F. Odeh, Polynomial search algorithm for motion estimation, IEEE Trans. Circuits Syst. Video Technol. 10 (5) (2010) 813-818. [12] J. Foster, R. M. Grey, M. D. Dunham, Finite-state vector quantization for

waveform coding, IEEE Trans. Inf. Theory 31 (1985) 348-359.

[13] N. M. Nasrabadi, C. Y. Choo, Y. Feng, Dynamic finite-state vector quantization of digital images, IEEE Trans. Commun. 42 (1994) 2145-2154.

[14] W. Kumwilaisak, C.-C. J. Kuo, Spatial error concealment with sequence-aligned texture modeling and adaptive directional recovery, J. Vis. Commun. Image Rep. 22 (2) (2011) 164-177.

[15] D. Liu, X. Sun, F. Wu, Inpainting with image patches for compression, J. Vis. Commun. Image Rep. 23 (1) (2012) 100-113.

[16] C.-S. Wu, S.-J. F. Jiang, and C.-H. Yeh, New intra prediction with finite state machine for H.264/AVC, Proc. SPIE Vis. Commun. Image Process. 7744 (2010) 774414-1-774414-8.

[17] G. Bjontegaard, Calculation of average PSNR differences between RD-curves (VCEG-M33), VCEG Meet. (ITU-T SG16 Q.6) (2001).