• 沒有找到結果。

3.4 Experimental Results for Performance Evaluation

3.4.3 Evaluation on Transformation Types

Since the elliptical region is already normalized into a circular image, the normalized region is affine invariant. Nevertheless, the normalized region is not necessarily invariant to rotation. Thus, for most of the descriptors including SIFT, SIFT variants and the steerable filters, the image rotation problem must be solved first by finding a dominant gradient orientation. Similarly, the circular image intensity normalization has made the region descriptor robust to intensity scaling and offset, but not to image blur, image noise, image compression, and the illumination change.

In image registration the two images can be taken by a single camera or different cameras, and the images can be taken during a short period or on different days. These shooting scenarios determine the type of image transformation encountered. For instance, if the two images are shot by different cameras or at different periods, the photometric conditions of the two shootings will be different, not to mention the possible viewpoint change. In general, a geometric transformation is accompanied by some sort of photometric change due to differences in the camera setting and the surface reflection angles.

A) Robustness under Photometric Transformations

To focus on the effects of photometric transformations, we try to avoid the effect of a geometric transformation by setting the region overlap error threshold Ot to a small value (0.2~0.3). Overall speaking, the ZM phase obtains the best performance results for all textured scenes under all type of photometric transformations and for the structured scenes under image blur and nonlinear lighting. The performances of the ZM phase, SIFT, GLOH and PCA-SIFT are comparable for the structured scenes under affine lighting change, image noise and JPEG when the value of 1 – precision is very small. The analysis on these performance results will be given later.

(i) Image Blur

The performance is measured under image blur introduced by changing the camera focus setting. Figs. 3.8(a)-3.8(b) show the respective PR curves for the bike structured scene (see Fig. 3.3(a)) with minor blur and severe blur, while Figs. 3.8(c)-3.8(d) show the respective PR curves for the tree textured scene (see Fig. 3.3(b)) with minor blur and severe blur. The performance ranking indicates that the best descriptor is ZM phase for both the structured and textured scenes considered. On the other hand, SIFT performs better than its variants, GLOH and PCA-SIFT, for the textured scene, while its variant performs better for the structured scene, as reported in [12]. The last ranking position is the complex moments. This is because its low dimensional feature vector (15 in this case) and its exclusive use of the moment magnitudes without the phase information.

(a) (b)

(c) (d)

Fig. 3.8: The PR curves for the structured bike scene with (a) minor blur (b) severe blur. The PR curves for the textured tree scene with (c) minor blur (d) severe blur, all with Ot = 0.3.

To show the performance discrepancies between the top best three descriptors (ZM phase, GLOH and SIFT) under image blur, Table 3.3 shows the matching statistics for the bike structured scene and the tree textured scene with a region overlap error of 0.3 and a recall value of 0.6. Fig. 3.9 depicts the correct and false region matches for the tree textured scene, when using ZM phase, GLOH and SIFT, respectively. There are 0, 11 and 42 false matches (shown by red lines) for ZM phase, SIFT and GLOH, respectively. All these descriptors have 112 correct matches (shown by green lines).

TABLE 3.3THE MATCHING STATISTICS FOR THE BIKE STRUCTURED SCENE AND TREE TEXTURED SCENE, ALL WITH

Fig. 3.9: The correct matches (in green) and false matches (in red) obtained by the descriptors, respectively, all with recall = 0.6 and Ot = 0.3.

(ii) Illumination Change

a) Affine Lighting Change

To evaluate the descriptor performances under illumination changes, a collection of images has been taken by changing the camera iris settings. Figs. 3.10(a) and 3.10(d) show the PR curves for the Leuven structured scene and the bush 1 textured scene shown in Figs. 3.3(c) and 3.3(d), respectively. The best three descriptors in order are ZM phase, SIFT, and GLOH for the bush 1 textured scene and the situation remains the same

b) Nonlinear Lighting Change

The nonlinear lighting is quite common in practice. Figs. 3.10(b) and 3.10(c) shows the PR curves under the underexposure and overexposure lighting for the Leuven structured scene shown in Figs. 3.3(e). Figs. 3.10(e) and 3.10(f) shows the PR curves under the underexposure and overexposure lighting for the bush 1 textured scene shown in Figs. 3.3(f). In comparison with the PR curves in Figs. 3.10(a) and 3.10(d) under affine lighting change, it can be seen that the performances of the SIFT-based descriptors become significantly worse. To the contrary, the performance results of the ZM phase have only a small change, especially in the case of the textured scene. This will be explained later.

(a) Affine lighting(structured) (b) underexposure (structured) (c) overexposure (structured)

(d) Affine lighting(textured) (e) underexposure (textured) (f) overexposure (textured) Fig. 3.10: The PR curves for the Leuven structured scene with (a) affine lighting change (b) non-linear lighting change (underexposure), (c) non-linear lighting change (overexposure). The PR curves for the bush 1 textured scene with (d) affine lighting change (e) non-linear lighting change (underexposure), (f) non-linear lighting change (overexposure), all with Ot = 0.3.

(iii) Image Noise

The performances are evaluated by adding a different amount of Gaussian noise to the images. Figs. 3.11(a) and 3.11(b) show the PR curve for the structured Chinese compound scene (see Fig. 3.3 (g)) with two different noise levels (SNR=20 and 10), respectively. Figs.

3.11(c)-(d) show the PR curve for the Japanese garden textured scene (see Fig. 3.3(h)). The ZM phase has the best overall result among all the descriptors for the textured scene and is comparable to the SIFT-based descriptors for the structured scene.

(a) image noise (structured) SNR=20 db

(b) image noise (structured) SNR=10 db

(c) image noise (textured) SNR=20 db

(d) image noise (textured) SNR=10 db

Fig. 3.11: The PR curves for the Chinese compound structured scene under image noise with (a) SNR=20 db, (b) SNR= 10 db. The PR curves for the Japanese garden textured scene under image noise with (c) SNR=20 db, (d) SNR=10 db, all with Ot = 0.3.

(iv) JPEG Compression

Figs. 3.12 depict the PR curves under JPEG compression for the structured UBC scene shown in Fig. 3.3(i) and the textured garden scene shown in Fig. 3.3(j), respectively. The qualities of the compressed images range from 10 to 30 percent of the original one. The performance ranking is similar to that under the noise attack.

(a) JPEG (structured)

quality = 30% (b) JPEG (structured) quality = 10%

(c) JPEG (textured)

quality = 30% (d) JPEG (textured) quality = 10%

Fig. 3.12: The PR curves for the structured UBC scene under JPEG compression with quality = (a) 30%, (b) 10%. The PR curves for the textured garden scene with quality = (c) 30%, (d) 10%, all with Ot = 0.3.

B) Robustness under Geometric Transformations

To focus on the effects of geometric transformations, we try intentionally not to change the photometric conditions. As shall be seen, under all geometric transformations, the ZM phase performs best for all textured scenes, but is comparable to the SIFT-based descriptors

for the structured scenes when the value of 1 – precision is less than 0.05.

(i) Viewpoint Change

We use six images of the textured and structured scenes taken under a viewing angle ranging from 10 to 50 degrees. Figs. 3.13(a) and 3.13(b) give the PR curves for structured graffiti scenes (see Fig. 3.3(k)) and the textured brick scenes (see Fig. 3.3(l)), respectively.

The ranking of the best four descriptors remain unchanged for the specified range [10o, 50o] of the viewing angle. The ZM phase descriptor clearly overpowers the five other descriptors for the textured scene, but not so for the structured scene.

(ii) Rotation Change

The images considered are taken by rotating the camera axis from 30o to 45 o. The descriptors for the structured castle scene (Fig. 3.3(m)) and the flower textured scene (Fig.

3.3(n)) under image rotation are evaluated. Figs. 3.13(c)-3.13(d) show the PR curves for the scenes, respectively. The ranking of the top three descriptors remains the same throughout the range of rotation angle and it is similar to the case of viewpoint change.

(iii) Scale Change

Figs. 3.13(e)-3.13(f) show the performance measures for the descriptors under the scale change using the Pentagon structured scene (Fig. 3.3(m)) and textured bush 2 scene (Fig.

3.3(n)), respectively. The scaling factor is close to 2. The performance rankings are similar to the above two cases of geometric transformations.

(a) viewpoint (structured) (c) rotation (structured) (e) scaling (structured)

(b) viewpoint (textured) (d) rotation (textured) (f) scaling (textured) Fig. 3.13: The PR curves under geometric transformation, all with Ot = 0.3

3.4.4 Evaluation on Feature Dimensionality

To extend the SIFT descriptor both GLOH and PCA-SIFT increase the feature size and then apply PCA to reduce the feature dimensionality. The features of these descriptors are originally correlated and become orthogonal after the application of PCA. However, their optimal dimensions are determined by the training images in the database.

The utilization of Zernike moments up to a higher order generally leads to a more accurate estimate of the region rotation angle and a better image representation power. Fig.

3.14 depicts the PR curves for two structured scenes under two different attacks when the ZM descriptor uses moments of order N up to 10, 12, and 16, respectively. The corresponding feature dimensions are 30, 42, and 72. It can be seen that the descriptor performance becomes

better as the feature dimension gets increased. The selection of order N = 12 is a tradeoff between the computational complexity and the descriptor performance.

(a) graffito scene (viewpoint change) (b) castle scenes (rotation change)

Fig. 3.14: The PR curves for ZM phase with the maximum order N = 10, 12 and 16, together with the associated PR curves of SIFT for two structured scenes under two different attacks, all with Ot = 0.3.

3.5 Analysis on Performance Evaluation Discrepancies and Time Complexity Analysis

Since the complex moments and the steerable filters are never ranked in the first position throughout the experiments due to their low feature dimensions chosen, they will be ruled out for further consideration. The SIFT, GLOH, and PCA-SIFT have similar performance results under all the transformations reported. In the following, it is sufficient to compare the performances of SIFT and the ZM phase.

A) The Effect of Image Intensity Fluctuation on the Descriptor Performance

We give a rule of thumb or a simplified explanation why the ZM phase descriptor performs better than other existing descriptors under non-uniform image intensity fluctuation, since an exact analysis varies with the underlying image and, therefore, is rather complicated.

First of all, the transformed image is obtained from the reference image according to a given

photometric or geometric transform, so their image pattern structures are correlated. After the affine intensity normalization, their image intensity distributions become closer and tangled.

Next, the phase difference of the ZM phase descriptor is computed as

1 Im( ) 1 Im( ) and imaginary ZM components of the difference image between the reference and transformed images. Since the image structures of the transformed and reference images are similar, so it is likely that the phase angles of the reference and transformed images are in phase (i.e., no phase difference after the image rotation alignment), especially when their ZM magnitudes are both large. The weighted sum of the absolute phase differences is, therefore, close to zero. On the other hand, the probability that the reference and transformed images are out of phase (a significant phase difference) is small. Consequently, most of the ZM moment counterparts of the image pair support the single majority of the estimated rotation angle, even though there is some fluctuation in the ZM magnitudes. This leads to the accurate rotation angle estimation when using the ZM phase.

On the other hand, the SIFT based methods utilize the gradient information. The local gradient angles in the transformed image remain considerably unchanged (except under image blur which causes the gradient angles damaged), but their gradient magnitudes change somewhat non-uniformly. Besides, there are generally several different gradient angles found in an image especially for the textured image. (This may not be the case for structured scenes with a distinguished edge orientation.) Therefore, the 36-bin orientation histogram will contain multiple candidates on the histogram ballot. When the gradient magnitudes change non-uniformly, the vote counting of the multiple candidates will change. This leads to a

change of the dominant orientation in the transformed image. It, in turn, triggers further non-linear changes in the 128 dimensional SIFT feature vector, regardless of the unit length feature vector renormalization at the end. This is why the performance of the SIFT based methods generally degrades under a given transformation especially for the textured scenes.

We shall use an example to justify our above reasoning.

Figs. 3.15 to 3.18 present four experimental results for the performance comparison between ZM phase and SIFT under non-linear lighting change (a power-law (gamma) transform with gamma = 3), JPEG compression (the quality of the transformed image is 5 percent of the reference one), viewpoint change and scaling change, respectively. The four figures are in the same format. Part (a) of the figures shows the region pair before and after affine intensity normalization in the gray color or in the pseudo color for better visualization, along with their difference images and difference intensity histograms. We can observe that the image structure of the transformed and difference images look similar to that of the reference image. This likely leads to the nearly equal real and imaginary parts of the ZM moments for the region pair except for a few components under the non-uniform intensity change, as indicated in part (b) of the figures. Therefore, the majority of the weighted phase differences are nearly zero, as shown in part (c) of the figures. On the other hand, the non-uniform intensity fluctuation causes the dominant orientation histogram and the 128 dimensional SIFT feature vectors to change non-uniformly, resulting in an expected greater dissimilarity between the two images shown in part (d) of the figure, as expected.

Non-linear lighting

reference transformed Reference transformed (before intensity normalization) (after intensity normalization)

normalized normalized reference – transformed - reference transformed transformed reference

( all images in pseudo colors)

(a) (b)

(c) (d) Fig. 3.15: A performance comparison of ZM phase and SIFT under non-linear lighting change. The detected

ellipse-shaped regions are normalized to a circular patch through the affine normalization process beforehand.

JPEG compression

reference transformed Reference transformed (before intensity normalization) (after intensity normalization)

normalized normalized reference – transformed -reference transformed transformed -reference

( all images in pseudo colors)

(a) (b)

(c) (d) Fig. 3.16: A performance comparison of ZM phase and SIFT under JPEG compression. The detected

ellipse-shaped regions are normalized to a circular patch through the affine normalization process beforehand.

Viewpoint change

reference transformed Reference transformed (before intensity normalization) (after intensity normalization)

normalized normalized reference – transformed -reference transformed transformed -reference

( all images in pseudo colors)

(a) (b)

(c) (d) Fig. 3.17: A performance comparison of ZM phase and SIFT under Viewpoint change. The detected

ellipse-shaped regions are normalized to a circular patch through the affine normalization process beforehand.

Scaling

reference transformed Reference transformed (before intensity normalization) (after intensity normalization)

normalized normalized reference – transformed -reference transformed transformed -reference

( all images in pseudo colors)

(a) (b)

(c) (d) Fig. 3.18: A performance comparison of ZM phase and SIFT under scaling change. The detected ellipse-shaped

regions are normalized to a circular patch through the affine normalization process beforehand.

In summary, noise, lighting change, compression, and blurring belong to the photometric transformation type which causes the image intensities to vary. On the other hand, viewpoint change, scaling and rotation belong to the geometric transformation type which first relocates the positions of the image points, and then requires some sort of intensity interpolation to compute the image intensities at the new image points; the new image intensities contain some non-uniform fluctuation (except for the rotation transformation which generally causes a very minor intensity fluctuation). We can apply the above-mentioned reasoning to conclude the ZM phase descriptor is generally more robust than the SIFT-based methods under these transformations especially for the textured scenes which generally containing the complex edge orientation information.

B) Rotation Angle Error Statistics and Its Effect on the Descriptor Performance

The descriptor performance discrepancy can be attributed to the different rotation angle estimation errors of the descriptors. The dominant orientation of the SIFT based descriptors relies on the peak detection in the 36-bin histogram of the gradient directions obtained from the region image, while the ZM phase descriptor computes the image rotation angle from the weighted sum of the ZM phase differences. Table 3.4 breaks down the estimated rotation angle errors (εangle) under the categories of 5, 10, 20, and 30 degrees for both textured scenes and structured scenes under all transformations except the viewpoint change. The rotation angle errors are evaluated by computing the estimated rotation angle for all normalized corresponding region pairs, and then compare them with respect to the actual angle. The actual angle can be obtained by the ground truth homographies given from [30], which are almost a similarity transform. The rotation angle error statistics are not available under the

viewpoint change, since the associated rotation angle between two regions under viewpoint change is not fixed.

TABLE 3.4THE ROTATION ANGLE ESTIMATION ERRORS FOR ALL CORRESPONING REGION PAIRS SPECIFIED BYOt=0.3.

εangle <5o εangle < 10 o εangle < 20 o εangle < 30 o Transform

type Scene type method

Avg % Avg % Avg % Avg %

ZM 1.801 86.022 2.333 97.312 2.502 98.925 2.502 98.925 Textured

(tree)

SIFT 2.484 39.785 4.134 59.140 6.090 72.581 7.787 80.108 ZM 1.663 92.547 2.027 98.758 2.162 100 2.162 100 blur

Structured (Bikes)

SIFT 2.147 51.553 3.605 72.671 4.543 80.124 5.080 82.609 ZM 0.764 96.755 0.947 99.705 0.979 100 0.979 100 Textured

(bush 1)

SIFT 1.747 68.437 2.724 84.366 3.366 89.676 3.426 89.971 ZM 1.506 93.662 1.641 98.775 2.115 100 2.115 100 affine

lighting

Structured (Leuven)

SIFT 1.726 62.676 2.631 78.873 3.313 83.803 4.071 86.620 ZM 1.099 94.561 1.284 97.908 1.363 98.745 1.363 98.745 Textured

(bush 1)

SIFT 2.002 53.556 3.115 69.874 4.327 79.498 4.532 80.335 ZM 1.220 93.662 1.481 95.592 1.584 99.296 1.715 100 non-linear

lighting

(underexposure) Structured (Leuven)

SIFT 1.906 63.380 2.944 80.282 3.308 83.803 3.463 84.507 ZM 1.564 92.857 1.838 98.352 1.893 98.901 1.893 98.901 Textured

(Japan garden)

SIFT 2.423 42.857 4.071 65.385 5.859 80.121 6.765 83.516 ZM 1.349 93.293 1.666 99.085 1.763 100 1.763 100 noise

Structured

(Compound) SIFT 1.781 69.207 2.814 85.671 3.377 90.244 3.377 90.244 ZM 1.318 93.817 1.654 100 1.654 100 1.654 100 Textured

(garden)

SIFT 2.107 50.269 3.722 73.387 4.948 83.871 5.272 85.215 ZM 1.112 93.158 1.326 96.842 1.552 98.947 1.552 98.947 JPEG

Structured (UBC)

SIFT 1.852 68.947 2.724 82.632 3.162 86.316 3.419 87.368 ZM 1.310 97.692 1.370 99.231 1.370 99.231 1.370 99.231 Textured

(flower) SIFT 2.346 54.483 3.910 81.379 4.662 89.655 4.973 91.034 ZM 1.061 98.755 1.117 100 1.117 100 1.117 100 Rotation

Structured (castle)

SIFT 1.777 74.274 2.544 87.552 2.963 91.286 2.963 91.286 ZM 1.414 92.623 1.625 97.541 1.840 99.180 1.840 99.180 Textured

(bush 2)

SIFT 2.222 53.279 3.519 70.492 4.340 76.230 5.390 80.328 ZM 0.913 98.551 0.999 100 0.999 100 0.999 100 Scaling

Structured

(Pentagon) SIFT 1.356 78.261 2.154 90.580 2.529 93.478 2.529 93.478

From Table 3.4 the average rotation angle errors of the ZM phase is smaller than those of SIFT for the structured scenes and textured scenes when εangle < 30o. More importantly, the coverage percentage is more than 86% for ZM phase and around 40% to 78% for SIFT when

pairs with rotation angle estimation error (εangle) less than a specific value (εt = 5o, 10o, 20o or 30o in Table IV) and the total number of correspondence:

# corresponding pairs with coverage percentage=

# correspondences

angle t

ε <ε

. (3.18)

The large rotation angle errors of SIFT are due to the big error caused by the ambiguity in the multiple dominant orientation peaks. This is the main reason why the SIFT performance becomes poor.

Lowe [7] suggested solving the multiple dominant orientation problem by creating multiple keypoints at the same location but with one of the dominant orientations (In this case there is no clear rule for counting the multiple keypoints as correct or false matches in generating the PR curves). In Fig. 3.19 the PR curves for the flower textured scene under image blur is plotted with the removals of region pairs with a rotation angle error no less than 10 o, 20 o, 30o, and 360o, respectively. The ZM phase performs better than SIFT for rotation angle errors not exceeding 20 o, 30o, and 360o, but not for the case of rotation angle errors

<10o, where SIFT does not face the multiple dominant orientation problem, as described previously.

Fig. 3.19: The PR curves for tree textured scene under image blur with the removal of regions with a rotation angle error not exceeding a specified level of 10 o, 20 o, 30o, and 360o, respectively.

C) The Effects of Feature Dimensionality and Feature Orthogonality on the Descriptor Performance

Generally speaking, the high dimensional feature vector contains more descriptive information at the expense of memory space. For example, PCA-SIFT and GLOH start with a feature dimension of 3042 and 272, respectively. However, the components of these feature vectors are correlated and partially redundant. By the application of PCA (principal component analysis) a subset of eigenvectors associated with the larger eigenvalues can be extracted and the projection of the original feature vector to the sub-eigenspace reduces the original dimension down to 128 or even smaller. The dimensionality reduction can be determined based on the percentage of the sum of eigenvalues retained.

We know the ZM phase applies a set of orthogonal ZM moments to design the feature vector such that the feature components are mutually independent and more informative. With the same dimensionality (or the same memory space) the set of orthogonal features generally results in a better descriptive power to distinguish the different image patterns embedded in the textured scenes. However, when the image patterns in the scenes are highly similar, it require a higher feature dimensionality in order to reflect the subtle pattern difference, as indicated previously in Fig. 3.14.

D) Time Complexity Analysis

The computation time for evaluating the descriptor performance consists of the region extraction time, the descriptor feature vector construction time and the region matching time.

Because all descriptors use the same set of regions of interest detected, so their region extraction times are the same. As for the feature vector construction time, the numbers of

Because all descriptors use the same set of regions of interest detected, so their region extraction times are the same. As for the feature vector construction time, the numbers of

相關文件