Camera Pose Estimation with RANSAC Outlier Rejection

4.1 Stereo Camera Localization and Mapping

4.1.4 Camera Pose Estimation with RANSAC Outlier Rejection

In previous section, the distinctive feature points are detected in each step, and the corresponding feature points will be matched in two consecutive frame data. The camera relative pose can then be estimated by using the spatial relation between these matching pairs using SVD method. However, using these matching pairs without any selection will cause inaccurate or incorrect localization result. Although the SIFT feature is quite robust comparing to most of the recent feature techniques, there might have some wrong matching cases such as repeating features or similar object in the world. In addition, the uncertainty of each stereo camera measurement point may contribute some drift to the final relative pose. To overcome the above problem, Random Same Consensus (RANSAC) outlier rejection framework is applied to find a best transformation matrix. The modified RANSAC algorithm to this case is listed in

0 4

C =I

Time

x

y

1 1 0

C =M C C₂ =M C_{2 1}=M M C₂ _{1 0} ^C_k =^M_k^C_k₋₁=^M_k^M_k₋₁^...^C₀ M1

…

x

y

x

y

x

y

M1 M₁ ^M²

t= t=1 t=2



t=k



Algorithm 4.2. It is assuming that the best transformation matrix is the model with largest number of inliers. In each iteration step, several matching pairs are selected randomly as sample data to estimate a transformation matrix M_current by SVD (as in

Algorithm 4.1). In order to determine which matching pairs are inliers, the feature points in current frame are transformed by M_current to previous camera coordinate, then each spatial error of the matching pair can be calculated by using Euclidean distance, which can be written as:

( ) (

) (

)

* * * *

, , 1 , , 1 , , 1 , , 1

( _{i k}, _{i k} ) _{i k} _{i k} _{i k} _{i k} _{i k} _{i k}

d PFP PFP x x y y z z

ε = ₋ = − ₋ + − ₋ + − ₋ (4.14)

* , ,

1 1

i k i k

current

PFP PFP

 =M  

   

  (4.15)

where PFP_{i k}^*_, =(x^*_{i k}_, ,y^*_{i k}_, ,z^*_{i k}_, )^T is the coordinate of PFP which is transformed by _{i k}_,

current

M as shown in Equation (4.15). If the Euclidean distance between PFP_{i k}^*_, and

, 1

PFPi k₋ is less than a predefine threshold, it is considered to be an inlier. By doing so, the transformation matrix and inlier set are calculated in certain step. Then, after

Iteration

N times iteration, there will be N_Iteration number of transformation matrix M _i and the corresponding inlier setsInliers_i, where i=0...N_Iteration. The best transformation

Mbest is determined by choosing the transformation matrix M with the largest number _i of inliers Inliers_i, which can be done iteratively without storing all the trying models

M with its inliers i Inliers_i as in the line 11-21 in Algorithm 4.2.

Algorithm 4.2: Feature-based localization with RANSAC outlier rejection algorithm Input:

Feature Matching Pair FP_{k k}_, ₋₁

Set the number of iterationsN_Iteration Number of sample pairsNsample

Output:

Transformation MatrixM_{t t}_{, 1}₋

Inlier listInliers

1: Initialize best transformation matrixM_best ← φ 2: Initialize best inliers setInliers_best ← φ

3: Initialize best inliers numberNInliers best_ =0

4: Calculate the number of matching pairs N_Match=size FP( _{k k}, ₋1)

5: for iteration = 1:N_Iteration

6: SampleSet ←Randomly select N_sample matching pairs inFP_{k k}_,₋₁

7: Compute Current Transform Matrix M_current from SampleSet using SVD (Algorithm 4.1)

8: Inliers_Current ← φ

9: for all FP_{k k}_, ₋_1,_i in FP_{k k}_, ₋₁

10: Compute the spatial error between M_current PFP_{i k}_, and PFP_{i k}_, ₋₁ by using Euclidean distance, that is, ε=Euclidean M( _currentPFP_{i k}, ,PFP_{i k}, ₋1)

11: if ε<threshold

12: InliersCurrent←InliersCurrent+FPk k, ₋1,i

13: end if 14: end for

15: Count the number of Inliers_Current, NInliers current_{_} =size Inliers( Current)

16: Recomputing the transformation matrix M_current by Inliers_Current using SVD (Algorithm 4.1)

17: if NInliers current_{_} >NInliers best_{_}

18: M_best ←M_current 19: Inliers_best←Inliers_Current 20: NInliers best_ =NInliers current_

21: end if 22: end for 23: Mk k, ₋1←Mbest

For better understanding, 2-th and 3-th frame data are taken as (k−1)-th and

-th

k steps for example. The given relative motion is a pure translation along x-axis with positive 0.1m without any rotation, and therefore the transformation matrix is as

follows:

Figure 4.11 (a)-(b) are two target images captured from right CCD of the stereo camera with corresponding feature points in (k−1)-th and k-th steps, respectively. The green circles indicate the features in the (k−1)-th step, while the red dots represent the features in the k-th step. Figure 4.11(c) and (d) show the projecting result of k-th step features from 3D coordinate to image plane by pin-hole model with different transformation matrixes estimated in two iterations. For better estimation iteration case,

features in k-th step are transformed by the following matrix:

1.0000 0.0019 0.0033

Most of the red dots align to the green circles as shown in Figure 4.11(c) and (e). The aligning pairs are equivalence to the 3D spatial inliers since the projection by camera pin-hole model is a degeneration process. This means that each feature in k-th step in the matching pairs is transformed correctly to the corresponding feature in (k−1)-th step. On the other hand, the incorrect transformation matrix is estimated in the second

features projection shown in Figure 4.11(d) and (f). The corresponding transformation

matrix in the second iteration is as follows:

0.9933 0.0536 0.1021 0.0177 0.0584 0.9973 0.0441 0.0605

0.0995 0.0498 0.9938 0.0068

0 0 0 1

− −

 

−

 

= − − 

 

 

(4.18)

The number of the inliers in the second iteration is less than the number of the inliers in the first iteration dramatically, and therefore this example shows that the relation between the best transformation matrix and the number of the corresponding inliers.

The final result of RANSAC outlier rejection algorithm is shown in Figure 4.12(b), only the green lines are considered to be the inputs of the camera pose estimation step, whereas the red lines are the outliers and do not be considered into the pose estimation step.

(a) (b)

(e) (f) Figure 4.11: Illustration of estimating the relative motion with RANSAC algorithm by

two iterations for example. Green circles indicate the feature points in (k−1)-th step, while the red dots indicate the feature points in k-th step. Feature points in k-th step are projected by pin-hole camera model with certain transformation matrix.

(a)-(b) Feature points in (k−1)-th and k-th steps respectively.

(d) Feature points projected with incorrect transformation matrix. It is obvious that a lot of red dots transformed by incorrect transformation matrix do not align to the green circles.

(e)-(f) The corresponding features plotting without showing images for better visualization to distinguish these points.

(a)

(b)

Figure 4.12: The result of using RANSAC outlier rejection algorithm on the matching pairs.

(a) By comparing the similarity, each previous feature is linked to the current feature as the same landmark.

(b) With RANSAC outlier rejection, some wrong matching pairs are removed.

Green lines indicate the inliers, whereas the red lines represnet the outliers that do not consider into the motion estimation process.

在文檔中利用立體攝影機進行色彩與深度感測以達成三維環境重建及物體追蹤 (頁 69-76)