Statistical Model for Pattern-based Block Motion Estimation

Based on the problem formulation in Section 3.1 and Section 3.2, the total average search points (ASP) for a sequence can be represented by (3.16). It depends on both search algorithm (SA) and

-the video sequence. It is -the sum of -the products of -the number of search points and -the motion vector probability distributions at all locations within the search area, where SPFSA(x,y) denotes the number of search points, PDFSA(x,y) denotes the motion vector distribution acquired by a specific algorithm, and A is the search area.

∑

∈

When we apply a specific algorithm to a specific sequence, we obtain the ASP directly from the experiments without the need of calculating (3.16), which requires the knowledge of PDFSA(x,y) and SPFSA(x,y). Our goal is to construct a generic model in which the dependency on SA and video sequence is separable. In other words, the PDF is sequence dependent but not SA dependent. And the search point function is SA dependent but not sequence dependent. That is, we would like to replace PDFSA(x,y) by PDFFS(x,y), and SPFSA(x,y) by WFSA(x,y) in (3.16). Thus, (3.16) becomes (3.17). Herein, PDFFS(x,y) denotes the PDF of the motion vector acquired by FS, and WFSA(x,y) denotes the weighting function of a specific algorithm discussed previously. With (3.17), we can thus predict ASP before actually applying a search algorithm to a video sequence, as long as we know the motion vector PDF acquired by FS and the WF of a specific SA.

However, (3.17) differs from (3.16) due to a few reasons. First, because the block-matching cost surface typically is not globally monotonic in the search area, the actual search process from time to time does not take the shortest path to the location of the best-matched motion vector.

Thus, the average number of actual search points, SPFSA(x,y), is higher than WFSA(x,y), the shortest-path (minimal) search points. Second, the motion vectors found by a specific algorithm sometimes differ from the ones found by FS. Consequently, the motion vector PDF of this specific algorithm, PDFSA(x,y), is not the identical to that of the full search, PDFFS(x,y).

--30 -20 -10 0 10 20 30

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

x-axis

probability

CG112 PDF Cross Section PDF_FS PDF_EHS

-30 -20 -10 0 10 20 30

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45

y-axis

probability

CG112 PDF Cross Section PDF_FS PDF_EHS

Fig. 3-4 PDF shift between PDF acquired by FS and that acquired by EHS (CG112).

Fig. 3-4 shows the cross sections of PDFs acquired by FS and those acquired by EHS for the video sequence, CG112. It is clear that these two PDFs are not identical. PDF shift refers to the phenomenon that PDFFS(x,y) differs from PDFSA(x,y). The main causes are: 1) the search pattern is relatively small, thus the search is trapped by a local optimal; 2) the early decision mechanism terminates the search when a near-optimal solution is found; and 3) the starting point of SA disagrees to that of FS, PMV, in our formulation.

Fig. 3-5 shows the theoretical WF and the empirical SPF obtained by applying EHS to the video sequence, FB1024, in the region [-10~+10, -10~+10]. Herein, on the left plot (the theoretical WF), the value on a contour represents the shortest-path search points for EHS to move from the origin to a point (location) on the contour and, on the right plot (the empirical SPF), it represents the average number of actual search points. For the empirical SPF, the contour is not continuous, because some motion vectors never happen when we apply a SA to a specific sequence. Thus there indeed exists some differences between the theoretical WF and the empirical SPF and therefore we called it WF drift. It happens because the search algorithm does not always follow the shortest path in the search process as discussed earlier.

Fig. 3-5 The contour plots of the theoretical WF and the empirical SPF by applying EHS to FB1024 (partial).

Fig. 3-6 PDF differences between PDFFS(x,y) and SFS(x,y) of CG112.

Moreover, Section 3.1 suggests that the distribution S(x,y) best approximates PDFFS(x,y). We can thus substitute S(x,y) for PDFFS(x,y) in (3.17) as long as its variances are known. Thus, S(x,y) becomes SFS(x,y), the S(x,y) that matches the motion vectors acquired by FS. However, the substitution of SFS(x,y) also induces new PDF matching error. Fig. 3-6 shows the PDF differences between SFS(x,y) and PDFFS(x,y) of the video sequence, CG112.

-Therefore, Eq.(3.17) needs adjustment to compensate for various shifts, drifts and model errors. Eq.(3.18) is a modified formula for modeling ASP. Two additional terms, C1 and C2, are included in Eq.(3.18). We propose that ASP is a linear function of the sum of the products of SFS(x,y) and WFSA(x,y). By tuning the values of C1 and C2, we can reduce the WF drift error, the PDF shift error and the PDF mismatch error. Consequently, with the pre-analysis of WFSA(x,y) for a specific SA and pre-calculation of SFS(x,y) for a specific sequence, one may use Eq.(3.18) to estimate the ASP values of another SA when it is applied to this specific sequence.

We need to justify the above model is valid for real data. There are two methods to decide C1

and C2. In the first method, we apply a fixed SA to a set of training sequences to compute C1 and C2 by the regression method. Our aim is that the model with trained C1 and C2 can predict the ASP of a new sequence accurately. In the second method, we apply a few search algorithms (the training algorithms) to a specific sequence, and then calculate C1 and C2 based on the acquired data. In this case the goal is that the model with trained C1 and C2 can predict the ASP values of a

Fig. 3-7 The actual ASP and the predicted ASP pairs for 4 popular search algorithms (1^st method)

-In the first method, C1 and C2 are acquired from a set of training sequences with one specific search algorithm. Fig. 3-7 shows the pairs of the actual ASP and predicted ASP of various sequences for the four popular search algorithms. Each training sequence is represented by a plus-sign mark, the solid line represents the case that the predicted ASP is exactly the same as the actual ASP. The X-axis represents the predicted ASP and the Y-axis represents the actual ASP.

Table 3-4 displays the C1 and C2 values for each search algorithm. The last column is the correlation coefficient between the actual ASP and the predicted ASP. One may notice that the correlation coefficients for all algorithms are very close to 1, which means that the predicted ASPs are nearly the same as the actual ASPs.

Table 3-4 Regression parameters (C1 and C2) and the correlation coefficients between the model-predicted ASP and the real data. (1^st method).

BME C1 C2 ASP correlation

FSS 0.42 10.38 0.98

DS 0.46 7.59 0.98

EHS 0.42 5.63 0.99

ERPS 0.44 2.97 0.98

In the second method, C1 and C2 can be acquired by applying a set of search algorithms (training algorithms) to a specific sequence. We then predict the ASP value of a new algorithm by using the proposed model. Fig. 3-8 shows the actual ASP versus the predicted ASP pairs for 10 sequences. Each training algorithm is represented by a cross-sign mark, the dash line shows the case that the predicted ASP is exactly the same as the actual ASP, and the X-axis represents the predicted ASP and the Y-axis represents the actual ASP. Table 3-5 displays the C1 and C2 values for the 10 sequences and the correlation coefficients between the predicted ASP and the actual ASP. The correlation coefficients are very close to 1 for all sequences except for FB1024, which has a value of 0.73. This may be due to the high motion nature of FB1024. In spite of the small number of training algorithms, the coherence between the predicted ASP and the actual ASP is very high for all 10 sequences.

Fig. 3-8 The actual ASP and predicted ASP pairs for 10 training sequences (2nd method).

-Table 3-5 Regression parameters (C1 and C2) and the correlation coefficients between model-predicted ASP and the actual data (2^nd method).

Sequence C1 C2 ASP correlation

CT256 1.07 -1.42 1.00 CT40 1.17 -4.70 0.98 HL40 1.19 -4.35 0.99 MD96 1.17 -4.52 0.97 CG112 1.05 -1.05 1.00 FM512 1.15 -3.60 0.99 FM1024 1.10 -2.36 1.00 FB1024 0.62 1.66 0.73 FG768 1.15 -3.76 0.98 ST1024 1.08 -5.82 0.91

The first method and the second method are designed for different scenarios. The first method is used to predict the ASP of a new sequence (for a given specific search algorithm), while the second method is used to predict the ASP of a new search algorithm (for a given specific sequence). Due to different sizes of training samples and purposes, the accuracy comparison between these two methods may not be meaningful.

In the following two sections, we will show how this model, (3.18), can be used to inspire the design of a new search algorithm as well as it can be used to predict the search performance of a new video sequence or a new search algorithm.

Section 3.4 Application I: Pattern-based Search Algorithm

在文檔中搜尋樣型之區塊移動估計研究：模型、演算法設計與視訊編碼應用 (頁 38-45)