The early termination mechanism terminates the search process when the block-matching error produced by a MV (in the search area) is smaller than a pre-chosen threshold. And in this case, this MV is accepted as the best MV. Clearly, there is a trade-off between the MV quality (matching error) and the computational speed. Thus, the challenge is to find the termination threshold that maximizes the speed gain and minimizes the quality degradation. In this section, we set up a systematic method to find the nearly optimal early termination threshold (ETT) [50].
The most commonly used block matching error is the sum of absolute difference (SAD). Due to the correlation among the spatial/temporal nearby blocks, [14] proposed a general form (4.27) for ETT. It suggests that the threshold is a function of the SAD and the MV of the neighboring blocks. and Tmin and Tmax stand for the lower and the upper bounds of the threshold, respectively. In practice, most researches use only the SAD predictor. For example, [16] suggests (4.28) and [30]
suggest (4.29).
65
-b SAD SAD
SAD a
T = ×min( 1, 2,..., n)+ , (4.28)
where a, b are fixed values.
δ +
=SADp
T , (4.29)
where SADp is the SAD of the co-located block in the previous frame (Fig. 4-14) and δ is a bias parameter.
To find the best threshold predictor, we use the correlation coefficient between the SAD predictor (SADpred) and the best SAD acquired using FS (SADc, as shown in Fig. 4-14) as the measure for the effectiveness of this threshold. First, we perform FS on the test sequences in Table 3-1 to obtain the SAD values of all blocks. For each of the SAD predictors, we calculate its correlation with the actual SAD (SADC) of the corresponding block. The one with highest correlation coefficient (closer to 1) is the best SAD predictor. By using the regression method, we find an approximation function (predictor) that best describes the relation between the predicted SAD and SADC. Also, we set up an upper bound for the threshold estimate to prevent the quality loss in the high ETT cases. And at last, we fine-tune the predictor coefficients (slope and offset) to achieve the desired speed and quality trade-off. This fine-tuned function thus serves as the early termination threshold.
SADC
SADU SADUR SADL
SADUL SADP
SADPD SADPDR
SADPP SADPR
SADPUR SADPU
SADPUL SADPL
SADPDL
Current Frame Previous Frame
Previous Previous Frame
Fig. 4-14 The SAD candidates in the current frame, the previous frame and the frame before the previous frame.
An ETT predictor often consists of two elements: 1) a selected SAD set of nearby blocks, and 2) a mathematical function operating on the selected SAD set. The most commonly used
66
-mathematical functions are mean(.), median(.), min(.) and max(.). The most commonly used 14 neighboring SADs are shown in Fig. 4-14. Combining them together, there are 65532 possibilities.
((14 ) 4 (214 014) 4 65532
i ). Moreover, we can insert different weighting before each block SAD, which leads to enormous forms of the SAD predictors. In our study, we select some representative SAD predictors ((4.30)-(4.43)). Table 4-16 shows the correlation coefficient between a few selected SAD predictors and SADC. Limited by space, only the better ones are shown there. Among the 55 SAD predictors under consideration, SADpred15 (mean SAD of the upper and left blocks) is the best predictor in 2D cases and SADpred35 (median SAD of the upper, left, and two previous blocks) is the best predictor in all cases (2D and 3D cases). Herein, the 2D cases only use the SADs of the blocks in the same frame, and the 3D cases can also use the SADs of the blocks in the current frame and the previous frame SADs.
⎟⎟
pred mean,min, max SAD SAD
SAD = (4.31)
pred mean median min max SAD SAD SAD
SAD = (4.32)
pred mean median min max SAD SAD SAD
SAD = (4.33)
pred mean median min max SAD SAD SAD SAD
SAD = (4.34)
pred mean median min max SAD SAD SAD
SAD = (4.35)
pred mean median SAD SAD SAD SAD
SAD = (4.36)
pred mean median SAD SAD SAD SAD SAD
SAD = (4.37)
pred mean median min max SAD SAD SAD SAD SAD
SAD = (4.38)
67
Table 4-16 The correlation coefficients between the selected SAD predictors and the actual block SAD.
SAD CT256 CT40 HL40 MD96 CG112 FM512 FM1024 FB1024 FG768 ST1024 Average All pred15 0.725 0.809 0.760 0.711 0.767 0.748 0.743 0.727 0.926 0.844 0.776 0.886
To produce a better SAD predictor on SADC, we have tried the multi-dimensional regression method. But we find that the linear regression is sufficient to have a pretty accurate approximation.
Consequently, (4.44) is the predictor of choice.
2 1
_ K SAD K
SADthLinear predicted = × pred + , (4.44)
Table 4-17 shows the coefficients of the best 2D/3D predictors for various test sequences.
The ‘Average’ row denotes the average values of all sequences. The ‘All’ row shows the values calculated using all sequences as data samples. To check the effectiveness of these predictors, we calculate the mean and the standard deviation (STD) of both the best 2D and 3D SAD prediction errors. In Fig. 4-15 and Fig. 4-16, each dot represents the SAD pair (SADpred, SADC) of a block.
The star mark at the center of a vertical bar represents the mean of SADC, and the bar length represents the standard deviation of prediction errors. It is obvious that the standard deviation becomes larger as the value of SADpred increases. This implies that for large predicted SAD values, their prediction accuracy is lower. Hence, to ensure a high MV quality, we propose an upper bound in (4.45) using the average SAD of all coded block in the same frame.
3
where SADi is the SAD of the i-th block in the current frame, K3 is the allowed maximum early termination error offset, and Nc denotes the current block index in a frame. Finally, the early termination threshold (ETT) is defined below by (4.46).
68
-) ,
min( thLinear_predicted thUpper_bounded
th SAD SAD
SAD
T = ≡ , (4.46)
The parameter values are empirically decided: K1 is set to 1, K2 is set to 384 and K3 is set to 512.
Under this setting, we achieve a good balance between speed and quality.
Table 4-17 Regression coefficients for the best 2D and 3D SAD predictors.
Predictor Pred 15 (best 2D) Pred35 (best 3D)
K1 K2 K1 K2
CT256 0.84 77.20 0.98 13.83
CT40 0.92 95.43 1.02 -11.95
HL40 0.88 90.38 1.04 -27.50
MD96 0.80 83.77 0.96 27.77
CG112 0.86 320.99 0.98 70.65
FM512 0.81 249.69 0.85 192.79
FM1024 0.79 239.74 0.83 200.46
FB1024 0.69 549.12 0.64 660.09
FG768 0.99 216.38 0.97 146.71
ST1024 0.95 165.75 0.93 202.48
Average 0.85 208.85 0.92 147.53
All 0.97 66.65 0.96 76.53
Fig. 4-15 Best 2D SAD predictor versus SADC
69
-Fig. 4-16 Best 3D SAD predictor versus SADC
The computational overhead of our proposed early termination mechanism is negligible when compared to the speed gain. In the memory requirement, it only needs to record the SAD of roughly a row of blocks in the 2D case and the SAD of roughly a frame of blocks in the 3D case.
As for the computing power requirement, it needs a few ‘compare’, one ‘shift’, one ‘multiply’, and one ‘divide’ operations for each block.
Table 4-18 shows the performance of DL AGPS with SPS and several early termination mechanisms. As suggested by their proponents, parameter a is set to 1.2 and b is set to 128 in (4.28), and δ is set to 50 in (4.29). We find that the DL AGPS with our best 2D ETT outperforms the plain DL AGPS scheme by 154% in average search points (0.02dB PSNR gain), and it outperforms (4.28) by 10% (0.01dB PSNR loss) and outperforms (4.29) by 11% (0dB PSNR gain). And the DL AGPS with our best 3D ETT outperforms the plain DL AGPS scheme by 162% in average search points (0.02dB PSNR gain), and it outperforms (4.28) by 14% (0.01dB PSNR loss), outperforms (4.29) by 15% (0dB PSNR gain) and finally it further outperforms our best 2D ETC by 4% (0dB PSNR gain).
70
-Table 4-18 The performance of DL AGPS with SPS and various early termination mechanisms.
Normal NO ETT (4.28) (4.29) Best 2D ETT Best 3D ETT Sequence ASP PSNR ASP PSNR ASP PSNR ASP PSNR ASP PSNR
CT256 5.24 39.62 1.59 39.59 1.57 39.54 1.38 39.55 1.36 39.63
CT40 5.72 32.67 2.00 32.85 1.63 32.83 1.71 32.89 1.63 32.87
HL40 6.25 34.54 2.14 35.05 2.01 34.90 1.64 35.05 1.56 35.03
MD96 5.91 40.12 1.89 40.24 1.78 40.25 1.52 40.23 1.48 40.21
CG112 5.88 29.12 2.64 29.02 2.91 29.09 2.43 29.01 2.23 28.99
FM512 6.78 34.09 3.27 33.98 3.34 34.00 2.70 33.91 2.54 33.93
FM1024 6.61 36.57 3.15 36.52 3.34 36.53 2.57 36.48 2.48 36.49 FB1024 10.55 35.02 5.44 34.78 5.88 34.95 5.06 34.79 5.09 34.85
FG768 6.14 26.20 3.08 26.18 2.81 26.18 3.98 26.17 3.84 26.18
ST1024 7.11 29.46 3.40 29.47 3.76 29.33 3.15 29.52 2.97 29.40
Average 6.62 33.74 2.86 33.77 2.90 33.76 2.61 33.76 2.52 33.76
We also test our proposed ETT on outside sequences, which are sequences not in the training set. These 4 extra sequences and their settings are in Table 4-19. The performance of DL AGPS with SPS and various early termination mechanisms on these sequences is shown in Table 4-20.
Table 4-19 The extra sequences and their settings.
Abbreviation Sequence Bitrate (K bps) Frame rate (fps) Number of frames
st96 silent 96 10 300
tt512 table tennis 512 30 300
mb1024 mobile calendar 1024 30 300
ne40 news 40 7.5 90
In Table 4-20, we find that the DL AGPS with our best 2D ETT outperforms the plain DL AGPS scheme by 151.1% in average search points (0.03dB PSNR loss) and it outperforms (4.28) by 11.6% (0.03dB PSNR gain). And it has about the same performance as (4.29) in both speed and quality. And the DL AGPS with our best 3D ETT outperforms the plain DL AGPS scheme by 166.8% in average search points (0.06dB PSNR loss), and it outperforms (4.28) by 18.5%
(0.00dB PSNR loss), outperforms (4.29) by 6.3% (0.03dB PSNR loss) and outperforms our best 2D ETT by 6.3% (0.03dB PSNR loss). Overall, the results of the outside sequences are consistent with the training sequences and, therefore, the proposed ETT is rather effective.
71
-Table 4-20 The performance of DL AGPS with SSP and various early termination mechanisms on the extra sequences in Table 4-19.
Normal NO ETC (4.28) (4.29) Best 2D ETT Best 3D ETT Sequence ASP PSNR ASP PSNR ASP PSNR ASP PSNR ASP PSNR
st96 5.86 35.25 2.34 35.26 2.06 35.27 2.00 35.28 1.98 35.28
tt512 6.02 35.15 2.23 35.02 2.31 35.08 2.07 35.05 2.03 35.01
mb1024 5.32 27.54 2.93 27.52 3.02 27.53 2.87 27.52 2.53 27.51
ne40 5.40 34.49 2.54 34.40 1.61 34.42 2.06 34.46 1.93 34.39
Average 5.65 33.11 2.51 33.05 2.25 33.07 2.25 33.08 2.12 33.05