Correction Algorithm and Experimental Results

Chapter 4 Construction of 3D Images from KINECT Images

4.3 Review of a Method for Geometric Correction of 3D Images

4.3.2 Correction Algorithm and Experimental Results

Based on the above idea, an approximating paraboloid equation may be derived according to the criterion of minimum sum of squared errors (MSSE) in the following way.

(1) Let the equation of the paraboloid be written as:

where A and B are the quadratic coefficients and C is an intercepted length from the KINECT device to the apex of the paraboloid, as shown in Fig. 4.8.

(2) The equation for computing the value SSE of the SSE is:

where (xi, yi, zi) are the values of a 3D image pixel computed by Algorithm 4.1 and h and w are the height and width of this depth image, respectively.

(3) To find the coefficients A, B and C, according to the minimum SSE criterion, the partial derivatives of Equation (4.16) with respect to variables A, B and C, respectively are derived, yielding the following equations:

(4) The above simultaneous equations may be solved to obtain analytic solutions for the values of the coefficients A, B, and C.

2 2

(5) The paraboloid is obtained by substituting A, B, and C back to (4.16).

Finally, we show two examples of the experimental results of applying the above scheme of geometric correction to 3D images in Figs. 4.9 and 4.10.

(a) (b)

Fig. 4.9 The 3D image of a wall seen from above. (a) Before correction. (b) After correction.

(a) ^(b)

Fig. 4.10 The 3D image of another wall seen from the top view. (a) Before correction.

(b) After correction.

Chapter 5 Modeling of Around-car Objects

5.1 Introduction

In this chapter, we describe how we combine data acquired with multiple KINECT devices, and show the results in the same screen as a single view, like as a front view, a side view, or a rear view. Before this, we have to do calibration for the purpose of associating the parameters of the real world and those of the virtual world which is constructed in this study.

The calibration work in 3D space costs processing time, so we use a k-d tree to speed up our calibration steps. After this process, we know the relationship between each pair of neighboring KINECT devices. Then, we merge all the data to see a single view not only as a sparse point cloud, but also as a complete model which is constructed by many polygons. Since the 3D rendering bottleneck for model construction is the number of polygons, we try to reduce the number of polygons in our data without breaking its geometric property. The method of reduction is based on the use of the quadric error matrix (QEM). Before using the QEM, we have to construct polygons in the acquired depth images first. As a consequence, we can see a 3D image which is constructed by multiple KINECT data via real-time rendering.

Moreover, the object in the screen which we want to tag such as cars or people is the kinds of important information in a video surveillance. After that, we can add an index into each frame which includes the object information, so that the search of interesting objects or scenes can be made automatic without human involvement.

5.2 KINECT Camera Calibration

5.2.1 Review of Calibration of a Single KINECT

In this section, we will introduce how we get the focal length of the RGB-camera in the KINECT device, which we mentioned in the previous chapter and will be used in the calibration process. At first, we measure an object in the screen to know the object’s height py in the unit of pixel. Next, by the pinhole camera model again as shown in Fig. 5.1, we can measure the distance d_z from the object to the KINECT device, as well as the object’s height h in the real world in the unit of mm. Then, we can apply the similar-triangle principle again to derive the following equation:

p h

f d .

Finally, the focal length f can be obtained by solving the above equation to be

f = pydz/h. (5.1)

f Z

Fig. 5.1 The pinhole camera model for calibration of the focal length of the camera.

5.2.2 Transformation of Coordinates

In this section, we will introduce the method of coordinate transformation with6 degrees of freedom, which can be derived by two parts  one is translation, and the other is rotation.

First, we define a vector used in the transformation to be:

T = similarly. Next, we define matrices for three ways of rotations, namely, pan, tilt, and swing, respectively, below:

cos cos sin sin sin sin cos cos sin sin sin cos

sin cos cos sin sin cos cos sin sin cos sin cos

cos sin sin cos cos

5.2.3 Review of Iterative Closest Point (ICP) Algorithm

The ICP algorithm is a method useful for 3D model alignment [6-8]. In this section, we will introduce the operation of the ICP step by step, and describe the bottleneck of the algorithm.

First, the original ICP algorithm is basically a brute-force method for aligning two clusters of points which are used to construct a scene or object model. The core idea of this algorithm is that “each point in one group finds a closest point in the other group.” More points this rule is satisfied by, more possibly the corresponding transformation (including the translation and rotation) is true. The detail of this

algorithm is shown below.

Algorithm 5.1: aligning two groups of points in a 3D space.

Input: Two groups of points, Pa and Pb.

Output: A translation matrix T and a rotation matrix R for aligning Pa and Pb. Steps:

Step 5. Make a little change of T and R on Pb, and compute Pb' = PbR + T. where R and T are as described in Section 5.2.2 (leading to Equation 5.7).

Step 6. For each point b in Pb', find the closest point s to it in Pa, and calculate the Euclidean distance d_bs between b and s.

Step 7. After all the points in Pb' have been processed, sum up all the distances to get

 

From the algorithm described above, we know that the speed bottleneck is Step 2 since for each point in one group, we have to check all the points in the other group by brute force. An improved method is proposed in the next section.

5.2.4 Calibration by the ICP algorithm Using Speeded-up k-d Tree

In this section we introduce how k-d tree can be utilized to speed up Algorithm 5.1. First, it is noted that the k-d tree is a data structure for “partitioning the 3D space.” After such partitioning, we can speed up Algorithm 5.1. In Step 2 of

Algorithm 5.1, it is necessary that for every point b in P_b', the closest point s to it in P_a should be found exhaustively. And this requires that all the Euclidean distances dbs

between b and s for all s in P_b' be computed. Instead of that way, we can maintain a better data structure using the k-d tree to know the neighboring points so that we can speed up by avoiding an exhausted search. The method for construction of the k-d tree is described as an algorithm below:

Algorithm 5.2: Construction of a k-d tree.

Input: A model with M points, and a pre-defined value S which imposes a limit on the number of elements in each node of the k-d tree.

Output: A k-d tree Tkd which stores M.

Steps:

Step 1. Create a node for storing M, and label its level as 0.

Step 2. If the current level mod 3 is 0, then load the X-axis value of M;

If the current level mod 3 is 1, then load the Y-axis value of M;

If the current level mod 3 is 2, then load the Z-axis value of M.

Denote each of these values of M as Mc.

Step 6. If the numbers of points of the left child is larger than S, then back to Step 2.

Step 7. If the numbers of points of the right child is larger than S, then back to Step 2.

Step 8. If the numbers of points of both the right and the left nodes are smaller than S, then construction of the desired tree T_kd is completed.

Note that in the above algorithm, if S = the number of M, the space isn’t partitioned. By the above algorithm, the search will become a binary one in the k-d tree T_kd, which will then speed up the search. The original algorithm conducts an exhaustive search with a time complexity of O(N²), where N is the number of points in M, while the new algorithm by the k-d tree search instead has a time complexity of O(N+N(log N)+S) where S is a constant and Nlog N is greater than N, so that the final time complexity is just O(N log N).

5.2.5 Calibration of Relation between Neighboring KINECT Devices

The ICP algorithm is described in the previous section. We can “calibrate” the relation between neighboring KINECT devices by the use of the ICP algorithm as well as some of our knowledge which have been mentioned in Chapter 2. The details of the algorithm to implement the above idea are shown below. Note that we use a box as the calibrate target in the algorithm.

Algorithm 5.4: Calibration of the geometric relation between two neighboring KINECT devices

Input: two depth image I_d1 and I_d2 which are acquired with two neighboring KINECT devices.

Output: the relationship (T, R) between these KINECT devices where T denotes a translation and R denotes a rotation.

Steps:

Step 1. Put a chair on the ground to elevate a box which is used the calibration target.

Step 2. Take depth images from the two neighboring KINECT devices, and conduct

Algorithm 4.1 without the color associating steps, i.e., transform only the 2D depth images into non-colored 3D images.

Step 3. Segment out the calibration target in the acquired images by some pre-learned knowledge like the position of the calibration target and the height of the KINECT devices.

Step 4. Using two 3D models resulting from the last step, Step 3, to conduct Algorithm 5.1 to get the precise relationship (T, R) where T denotes a translation and R denotes a rotation, both from a model to the other.

Step 5. Repeat Step 1 through Step 4 to find the relationship for each pair of neighboring KINECT devices until done.

After the above process of calibration is completed, a series of rotation and translation parameters are obtained, which may be used to merge the data from all the KINECT devices. More details will be described in Section 5.3 next, and the result is shown in Section 5.4.

5.3 Merge 3D Images from Multiple KINECT Devices

5.3.1 Coordinate Mapping between Local and global

The coordinate mapping between the local and the global coordinate systems is a serial process which can be written as a matrix as described in Section 5.2.2. In our system, to merge 3D images, the global coordinate system (indicated by the green lines in Fig. 5.2) is taken to be on the front KINECT device which faces straightly ahead of the car. The other coordinate system built on another KINECT device (indicated by the red lines in Fig. 5.2) are mapped into the global coordinate system

with their relationship computed by Algorithm 5.4. By this mapping, the 3D images corresponding to the two KINECT devices can be merged into a single one.

X Y

Z U

(T, R)

Fig. 5.2 Transformation between two coordinate systems built on KINECT devices affixed on the car.

Furthermore, to merge the 3D images of a cluster of KINECT devices (more than two KINECT devices), the matrices (T, R), which are acquired by executing Algorithm 5.4 and appear in Equation (5.7), should be applied in series. A tip for this series of transformations is “backward” since multiplications of matrices have the property that the last multiplied matrix affects the result of all the previously multiplied ones.

For instance, when we want to map the 3D image of Device 3 into the global coordinate system which is built on Device 1 as shown in Fig. 5.3, at first, we map each point P3 in the 3D image of Device 3 to a point P2 in the 3D image of Device 2 by the following equation according to Equation (5.7):

P₂ = R₂P₃ + T₂. (5.8) In the same way, we map a point P2 in the 3D image of Device 2 to a point P1 in the 3D image of Device 1 by the following equation:

P₁ = R₁P₂ + T₁. (5.9) Merging the two equations of (5.8) and (5.9), we get:

P₁ = R₁R₂P₃ + R₁T₂ + T₁. (5.10) That is, every point P₃ in the 3D image of Device 3 can be mapped directly into a point P1 in the 3D image of Device 1 by Equation (5.10).

Equation (5.10) may be extended easily to cases involving more than three KINECT devices. The derivations are similar and omitted.

X Y

Z U

(T1, R1)

P Q

(T2, R2)

Fig. 5.3 The transformations involved in merging three 3D images taken by three neighboring KINECT devices.

5.3.2 Reduction of Merged Data Using Mesh Structure

The data structure “mesh” may be used to reduce large amounts of image data.

This data structure illustrates how points are connected, and this is important information for reducing data. In this section, we will introduce the construction of mesh on the depth image.

The mesh on a depth image may be drawn in a way like that shown in Fig. 5.4.

Each time we construct two triangles with three points per triangle. One is the triangle with corners (i, j), (i, j+1), and (i+1, j); and the other is the triangle with corners (i+1, j), (i, j+1), and (i+1, j+1), where 1  i  I and 1  j  J with I and J, as shown in Fig.

5.4, being the width and height of the depth image, respectively.

(i, j) (i+1, j)

(i, j+1) (i+1, j+1)

Fig. 5.4 Constructing a mesh on the depth image for each pixel (i, j).

When the depth image is with size wh, the desired mesh can be constructed to be composed of (w  1)(h 1)2 triangles (or polygons), as can be Fig.d out.

5.3.3 Review of Quadric Error Matrix (QEM)

In this section, we review a surface simplification method using quadric error metrics [9]. By this method, we can remove some polygons in a 3D model, and keep the shape in the model as much as possible, as we have done in this study.

In this method, the error of removing each edge is calculated, and each time only the edge which has the minimal error is removed. In other words, removing the minimal-error edge will not change greatly the shape property of the model. The detail of an algorithm implementing the method is as follows.

Algorithm 5.5: simplification of 3D model shape using quadric error metrics.

Input: A model with points P.

Output: A simplify model.

Steps:

Step 1. Construct a mesh as a multiple of triangles in the following way.

1.1 For each triangle p, find its plane equation as ax + by + cz + d = 0 where a² + b² + c² = 1.

1.2 Construct a 44 matrix K for this triangle p as follows:

Kp = calculating a 44 matrix Qv as follows:

Q_v= _p

v P



 ^.

Step 3. Start edge contraction and use the notation w to express the union of v₁ and v₂, so that the new Qw for the new point w is simply Qv1+Qv2 with w being:

52 where w is obtained from Step 3.

Step 5. Select the edge with the minimal error and remove it, and then do edge contraction which is described in Step 3 and update all the errors of edges which involve points.

Step 6. Repeat the above steps until simplification is enough.

This algorithm creates a simplification effect on the mesh each time, as

Fig. 5.5 The contraction of two vertices. (a) Before contraction. (b) After contraction.

By using this algorithm, we can remove some polygons and keep the shape as much as possible since we remove an edge with the minimal error each time.

5.3.4 Merge Algorithm

In this section, we summarize the above-described algorithms to present an algorithm for merging all the 3D images constructed from the color and depth images acquired by the multiple KINECT devices. In the algorithm, at first, we acquire depth and color images with the KINECT devices in realtime. Then, a series of processing works are done in an offline fashion. Finally, the processed data are used for rendering the shape of the object model in realtime. The detail of the algorithm is shown in below. A corresponding flow chart is shown in Chapter 2.

Algorithm 5.6: Merge of multiple 3D images.

Input: Multiple color image IC and depth image Id acquired with by the KINECT devices, respectively.

Output: An around-car model M for display on the screen.

Steps:

Step 1. For each KINECT device, convert their images Id and IC into a 3D image I3D

using Algorithm 4.1.

Step 2. Transform each 3D image I3D into the global coordinate system using the method described in Section 5.3.1 where the relationship (R, T) is acquired by Algorithm 5.4.

Step 3. Construct a mesh on the depth image I_d as described in Section 5.3.2

Step 4. Reduce the polygons on the mesh with Algorithm 5.5 after choosing the edge with the global minimal error each time.

Step 5. Save the processed data into a file (including the coordinates of each point and the corresponding color).

Step 6. Rendering the file with the OpenGL.

As can be seen from Step 6 of the above algorithm, the data processing and rendering works are separated.

5.4 Experimental Results

First, a result of calibration is shown in Fig. 5.6. We get the relationship by aligning the two boxes which appear in two images acquired by two KINECT devices.

We show the calibration target by using a method of “lighting” depth data, since the calibration uses the depth image only.

Next, a result of merging multiple 3D images to construct an around-car image is shown in Fig. 5.6.

(a) (b)

Fig. 5.6 The calibration of two neighboring KINECT devices. (a) Before alignment (b) After alignment.

(a) (b)

Fig. 5.7 The constructed around-car model. (a) Front view. (b) Rear view. (c) Right side view. (d) Left Side view.

Fig. 5.8 The constructed around-car model. (a) Front view. (b) Rear view. (c) Right side view. (d) Left Side view. (Cont'd).

5.5 Object Detection

5.5.1 Object Detection in Depth Images

Finding an object in a depth map is an important thing which has been mentioned at the beginning of this chapter, and it’s also important for constructing our long range view since the nearby object will affect our algorithm which will be described in the next chapter.

Because we have not only depth images, but also color images, the work of object detection may be conducted differently from the traditional object detection work using color images only. Our method of object detection is proposed by using depth images.

Since the method of transforming the depth image to a 3D image was described in the previous section, our method may be regarded as a cross-layer method. Because we can’t know more detailed information other than the depth image, the 3D image is also needed in the proposed method. The detail of an algorithm implementing the method will be described in the next section.

5.5.2 Component Labeling by Region Growing on Depth Images for Object Detection

In this section, we will describe our method proposed for object detection. First, we apply region growing to the input depth image, where the growing condition is the depth difference between two pixels. In addition, the object segmentation operation is controlled by a threshold. And the 3D information is used to infer the object size or the ground height in the real world.

By using this method, we can label each component (or object) in the depth image. The details implementing the proposed method are described as an algorithm in the following.

Algorithm 5.7: Object detection.

Input: A depth images Id and the related 3D images I3D. Output: The label of each component C.

Steps:

Note that the use of a very small threshold  will lead to growing broken objects, while the use of a very large threshold will lead to yielding just one object, i.e., all objects being merged into a single one. On the average, the value of  is considered to be 300mm for segmenting normal-sized objects. An example of nearby object detection by Algorithm 5.7 is shown in Fig. 5.8.

(a) (b)

Fig. 5.9 The object detection result. (a) Original color image. (b) The detected object part in the color image.

Chapter 6 Long-Range View Construction and Display

6.1 Ideas of Proposed Techniques

In this chapter, we will describe the proposed method for long-range view construction and display. We have introduced the method of constructing 3D images around a car in the previous chapters. In addition, we found that the far views of scenes which we are also interested in have no depth information, but we still need them to enrich our 3D images and to facilitate inspection of nearby objects against far backgrounds.

Therefore, we propose a method for stitching respective 2D color images acquired by the KINECT device to compose a long-range view of the background scene by putting them in an identical frame. The resulting long-rang view will appear to be a single panoramic image which is also a color image. We want to make this 2D panoramic image to “sit” at the back of nearby objects which have been modeled by techniques described in the previous chapter. The model is a 3D image, but our panoramic image is 2D, so the relation must to be constructed properly so that we can

在文檔中利用多部KINECT建構環車3D行車紀錄器及其應用 (頁 49-0)