Three-Dimensional Environment Reconstruction

In the last decade, many researchers have been investigated on how to reconstruct an environment map precisely using RGB-D sensor. According to the work in [1: Henry et al. 2012], to build a 3D environment map completely, a mapping system should

consider three components, which are spatial alignment (localization), close loop detection and global consistency.

For the first component, which is spatial alignment, is the most important element for mapping system to localize the sensor poses. As mentioned in Section 1.3, if sensor does not know its position accurately, the measurements from the sensor cannot map to the correct positions in the 3D global model. Many existing ways to align two consecutive data frames have been developed to achieve the goal of localization method.

The traditional and most popular way to align two point clouds is Iterative Closest Point (ICP) method [9: Bsel et al. 1992]. In the ICP registration algorithm, closest point in different point clouds is associated to compute the optimal rigid transformation iteratively that minimizes the mean-square error of each associated point between two datasets. However, due to noise points in the range data that affect the correctness of

point association, many ICP variant related techniques are proposed to solve this problem. For example, [10: Turk et al. 1994] proposed the point pairs elimination mechanism to remove point pairs that are too far apart or either points locates on a mesh boundary to avoid the outliers effect. [11: Chen et al. 1991] proposed point-to-plane error metric instead of point-to-point and get a better result on two surfaces registration.

Both of these two variant methods only consider the spatial information. For sensors that generate color point cloud, performing ICP with color constraint can solve the data association problem more convenient. For example, [12: Johnson et al. 1997] proposed the point pairs elimination using hue (the hue channel of HSV color space) of each points as a filter to be a constraint during the closest point search in every ICP iteration.

In [13: Men et al. 2011], the method not only consider the hue of each point as an elimination constraint, but includes the hue into the error metric as 4D-ICP, which the 4D means the , , -coordinatex y z and an additional hue intensity. Although many ICP variant algorithms solve the data association problem, both the above ICP and ICP variant algorithms are suffered from initial guess problem since ICP method aligns two data sets to the local minimum. To solve the initial guess, Makadia [14: Makadia et al.

2006] proposed the method to automatically estimate the initial guess and refine the

alignment by translating point cloud surface normal vector distribution into orientation

image feature-based localization, which is often called visual odometry, are the most popular to RGB-D type sensors since the initial guess can be easily solved by using the image feature such as Scalar Invariant Feature Transform (SIFT) [20: Lowe 2004] or Speeded-Up Robust Features (SURF) as landmarks [16: Scaramuzza et al. 2011].

However, because many outliers such as wrong feature matching pairs affect the pose estimation result, Random Sample Consensus (RANSAC) outlier rejection algorithm is applied to solve this problem [17: Nister et al. 2004]. Moreover, for binocular stereo vision, since two image planes are fixed, the feature coordinates in reference image plane can be a constraint to check the correctness of each matching pairs of the target image plane in feature matching step. This concept was proposed in [18: Kitt et al.

2010], using the so called trifocal tensor to describe the relationship between three

images (which are the two images from previous step and the target image in current step). Besides, [1: Henry et al. 2012] proposed two stage RGB-D localization method by fusing feature tracking with RANSAC outlier rejection and ICP. However, the authors claim that feature-based method is good enough and applying ICP can refine the result slightly. Since image feature-based localization with RANSAC can solve the initial guess and outlier rejection to get a precise localization result and is easily implemented, this thesis chooses this method to achieve to goal of localization.

For the second and third components, which are close loop detection and global

consistency, are used to minimize the error during the frame-by-frame localization. To detect close loop data frames, keyframes are selected and are compared in each data frame [1: Henry et al. 2012]. After detect the close loop, some optimization methods are used to minimize the error. For example, in [1: Henry et al. 2012], two methods are implemented to compare the results: the first method is tree-based network optimizer (TORO) which uses stochastic gradient descent to maximize the likelihood of node parameters subject to the constraints; another is sparse bundle adjustment (SBA), which globally minimize the re-projection error of feature points which are matched in all data frames. Loop detection and global consistency are essential when reconstructing large scale environment model. However, the scenarios in this thesis do not encounter loop closure and global consistency and these problems are considered to be the future works.

Figure 2.1: Sensor localization categories.

Localization

Spatial Information

Spatial +Color Information

3D ICP Combine Color

Information ICP Image Feature

and ICP Fusion 3D ICP with Surface

Normal Histogram

4D-ICP ICP with Point Association

Constraint from Color [12: Men]

[8: Besl ]

[13: Makadia] [1: Henry]

[11: Johnson]

3D ICP Variant [9: Turk]

[10: Chen]

Image Feature Tracking [16: Nister] [17: Kitt]

在文檔中利用立體攝影機進行色彩與深度感測以達成三維環境重建及物體追蹤 (頁 27-31)