Thesis Organization - 運用多台KINECT對汽車周遭做立體監控

Chapter 1 Introduction

1.5 Thesis Organization

The remainder of this thesis is organized as follows. In Chapter 2, we introduce the configuration of the proposed system and the system processes in detail. In Chapter 3, the method for constructing 3D images is described. In Chapter 4, we introduce the proposed methods for image matching based on the 3D DWC and modeling of around-car events. In Chapter 5, we describe the proposed methods for real-time response to around-car dangerous events. The details include monitoring of ramps, monitoring of limitations of heights, and computation for collision-avoiding driving. In Chapter 6 experimental results and discussions are presented. Finally, conclusions and some suggestions for future works are given in Chapter 7

Chapter 2 System Design and Processes 2.1 Ideas of Proposed System

Because of the need for around-car imaging and monitoring in this study, we affix multiple KINECT devices to the car body, as shown in Figure 2.1. After reading the image data collected by the devices, we realized that the data of four KINECT devices affixed around the car as shown in Figure 2.1(a) are not enough to cover the entire car surround because the horizontal viewing angle of a KINECT device is only 57 degrees. In order to collect complete around-car information, the number of the KINECT devices affixed to the car body is increased to be 14 later in this study. In order to acquire data from the front top of the car, we also affix one more KINECT device to there. So, totally 15 KINECT devices are affixed to the vehicle used in this study.

(a) (b)

Figure 2.1 Proposed KINECT-based around-car monitoring system installed on a vehicle. (a)With 4 KINECT devices. (b) With 15 KINECT devices.

In more detail, in the proposed system a KINECT device occupies a USB port of the laptop or desktop computer used as the controller of the proposed monitoring system. Although one laptop or desktop computer contains lots of USB ports, the hardware and speed limitations of the controller computer do not allow too many KINECT devices to work simultaneously because every KINECT device need be connected to a distinct USB port. Normally, a desktop computer contains two USB controllers while a laptop contains only one. In order to increase the number of USB ports without using more computers, a USB expansion card is used. As a result, nine KINECT devices can work simultaneously using one computer as the host. In Section 2.2, we will describe the configuration of the proposed system, including the hardware and software. The system processes, including the learning process and the monitoring process, will be described in Section 2.3.

2.2 System Configuration

In order to record and monitor around-car conditions, KINECT devices are affixed to the car body. A SDK for acquiring color and depth information and a software program for displaying the collected data is needed besides the KINECT devices. In Section 2.2.1, we introduce hardware equipment, including iron brackets of the KINECT devices, the desktop computers used for experiments, and the structure of the KINECT device, etc. In Section 2.2.2, some software, including the OpenNI, the OpenGL, and the used programming language are introduced.

2.2.1 Hardware Configuration

At first, we introduce the protagonist — the KINECT device. It includes a color

VGA video camera, a depth sensor, a multi-array microphone, and a motor for tilting the device. The hardware specifications of the KINECT device [6] are sorted and listed in Table 2.1. The appearance of the KIENCT device [7] is shown in Figure 2.2.

In this section, some experiences obtained during using KINECT devices in this study are described. From Table 2.1, we can see that the range of the depth sensor is 1.2 to 3.5 meters. In fact its working range is larger, from 0.8 to about 6.0 meters. In addition, the accuracy of the depth image is millimeter. The tilt range of KINECT device is ±27 degrees. In fact, the KINECT device can tilt for about 30 degrees according to our real test results.

Table 2.1 Hardware specifications of a KINECT device.

Horizontal viewing angle 57 degrees

Vertical viewing angle 43 degrees

Tilt range of the device ±27 degrees

Range of the depth sensor 1.2-3.5 meters

Resolution of color images 640 x 480

Resolution of depth images 320 x 240

Next, the desktop computer used as the monitoring controller is introduced. The processor is an Inter Core i7-3770 Quad-core CPU with a Transcend DDR3 16G RAM; and the graphics card is of model Gigabyte GV-R7750C 2GI. We choose to use such a powerful processor and so big memory in order to use the processor to analyze lots of image data.

Furthermore, iron brackets are introduced, which were designed for holding the KINECT devices around the car, as shown in Figure 2.3. Three bases are affixed to each of the four sides of the car, and two bases are affixed onto the rear-view mirrors

of the car, one on each side, as shown in Figure 2.4.

Figure 2.2 Appearance of the KIENCT device.

(a) (b)

Figure 2.3 A base for the KINECT device (a)Without a KINECT device. (b) With a KINECT device.

(a) (b)

Figure 2.4 Around-car bases (a) A front view. (b) A back view. (c) A lateral view. (d) A rear-view mirror.

2.2.2 Software Configuration

In the software configuration, we first introduce one of the SDKs of the KINECT device — OpenNI. OpenNI is the open source which enables the KINECT device to work on a computer and help us to acquire color and depth information captured from KINECT devices. Furthermore, it drives the motor in the KINECT device for the purpose of adjusting the angle of elevation.

OpenGL is also an open source which provides libraries to draw points in the 3D space. Thus, we can display constructed 3D images easily, as shown by the example in Figure 2.5.

Figure 2.5 A 3D image drawn by OpenGL.

Last, we introduce the programming language and development tool used in our program. The common C language, C++, is used as our programming language in order to be consistent with that use in the library of OpenNI. The adopted development tool is also the common Microsoft Visual Studio 2010, which is used as the compiler of the program.

2.3 System Processes

2.3.1 Learning Process

Before the monitoring process, a learning process is necessary. To merge 3D

images taken by neighboring KINECT devices, calibrating the relationship parameters of the devices are needed. Regarding the fact that a KINECT device is able to tilt in vertical direction, at the beginning of the calibration process, we use the libraries of OpenNI to adjust the device angle by controlling its motor to make sure its direction is parallel to the ground. Then, an ICP scheme is adopted for computing the horizontal angle and the displacement of the device with respect to its neighboring one.

Another task of learning is computing the environmental parameters. When driving on the road with different widths, setting different thresholds to filter out unrelated scenes is needed to obtain the monitoring target accurately. And we do this in our learning process as well. A flowchart of the proposed learning process is shown in Figure 2.6.

2.3.2 Monitoring Process

While driving on the road, once a ramp is discovered, the slope in degrees of the ramp will be displayed by the system as a warning to the driver. The system will also notify the driver a signal about whether to pass the ramp or not when driving to a place with a limitation on the car height, such as at the entrance of a parking lot, while going through a road under a bridge, etc. Furthermore, the driver is notified as well whether a car collusion is going to happen while encountering a by-passing car.

Finally, after the car is parked, the system is used in an offline fashion to model around-car scenes, and analyze them to check the responsibility if legal cases about car accidents arise. A flowchart of the monitoring process is shown in Figure 2.7.

Start of learning

filter threshold

OpenGL

3D images

Remove unrelated scene?

End of learning Yes

No

motor control by OpenNI

ICP

Relationship parameters

Figure 2.6 A flowchart of the learning process.

14 Start of Monitoring

Driving? Around-car

monitoring

Ramp monitoring

Height monitoring

Collision monitoring Yes

Figure 2.7 A flowchart of the monitoring process.

Chapter 3 Construction of 3D Images from KINECT Images

3.1 Review of Pinhole Camera Model

The pinhole camera is a simple camera model with an aperture of only the pinhole size. It may be regarded as an opaque box with a pinhole on one side. The light passing the pinhole will produce an upside-down projection of the scene in front of the pinhole, as illustrated in Figure 3.1 [8].

Figure 3.1 An illustration of a pinhole camera model.

The pinhole camera model describes the mathematical relationship between a 3D point and its projection on the image plane of the pinhole camera. An illustration of the geometry of the pinhole camera model is shown in Figure 3.2. From the two similar triangles appearing in Figure 3.2(b), we can derive the following equation

according to the similar-triangle principle:

When we look in the negative direction of the X1-axis, the following equation can be derived similarly:

Summarizing these two equations, we get the following vector equation:



Figure 3.2 The geometry of a pinhole camera. (a) Seen from a 3D point. (b) Seen from the X1-axis.

3.2 Construction of 3D Images

3.2.1 Coordinate Conversion

From Equation 3.3, we can get:

And from Figure 3.2(a) and using the similar-triangle principle again, we have the equation:

where   y1 ² y2²f² is the length of the line segment OQ, and x₁²x₂²x₃²

is the length of the line segment OP which is the depth captured by KINECT device, and is denoted as d in the sequel. Let R present the center of the depth image. It is located at coordinates (320, 240) in a depth image of resolution 640480 acquired by the KINECT device. And let Q be located at image coordinates (x_p, y_p) and let y₁ and y₂ represent the distances to the center Q in the vertical and horizontal directions, respectively. The letter f denotes the focal length of the KINECT device with its value being 600. The equations (3.7), (3.4), (3.5), and (3.6) can be rewritten, according to the mentioned parameter values, to be:

The unit of x_p and y_p is pixel and that of x₁, x₂, and x₃ is millimeter. With the above equations, the coordinates in different coordinate systems can be converted successfully for 3D image construction as described next.

3.2.2 Idea of 3D Image Construction

The basic idea proposed in this study of 3D image construction is to convert the coordinates of the depth image described in Section 3.2.1 into 3D points. Then, we can draw these 3D points in the 3D space by the OpenGL. Finally, every RGB value of the pixel in the color image is “attached” to the corresponding 3D point. Colorful point clouds will be then formed in the resulting 3D image.

3.2.3 Construction Algorithm

In the proposed 3D image construction algorithm, assume first that the depth image coordinates are already converted into 3D points by coordinate conversion as described in Section 3.2.1. Then, it is desired to find the color information corresponding to each 3D point. By Equation (3.3) and the definitions given in Section 3.2.1, we have:

Therefore, the space coordinates (x₁, x_2, x₃) of a 3D point can be used to find the coordinates (xp, yp) of the corresponding image point. A flowchart of the resulting 3D image construction algorithm is shown in Figure 3.3. A detailed description of the

algorithm is as follows.

Algorithm 3.1: 3D image construction.

Input: a depth image I_d and a color image I_c captured by a KINECT device.

Output: a 3D image I3D formed from Id and Ic with colorful 3D points.

Steps:

Step 1 Convert the coordinates of the depth image Id into 3D points by coordinate conversion described in Section 3.2.1.

Step 2 Substitute the coordinates of each 3D point into Equations (3.12) and (3.13) respectively to find the image pixel corresponding to the 3D point and get the pixel’s color.

Figure 3.3 A flowchart of the 3D image construction algorithm.

3.2.4 Experimental Results

Figure 3.4 shows the depth image and the color image which are the inputs to the algorithm of 3D image construction. After the algorithm is executed, we have the 3D images, as shown in Figure 3.5. To prove the output of the algorithm to be three-dimensional, Figure 3.5 (b) shows the top view of the 3D image from which the depth of the car in the 3D image can be seen obviously, confirming the 3D nature of the output of the algorithm.

(a) (b)

Figure 3.4 Images acquired by a KINECT device. (a) The depth image. (b) The color image.

(a) (b)

Figure 3.5 A constructed 3D image. (a) A perspective view. (b) A top view.

3.3 Review of a method for Geometric Correction of 3D Images

3.3.1 Idea of Geometric Correction

From Figure 3.5 (b) in Section 3.2.4, we discovered that the farthest points of the depth image form an arc shape rather than a straight line. The reason why this problem arises is that the infrared light rays sent out by the KINECT device for depth sensing do not go in parallel as mentioned in Section.1.3. It affects the accuracy of the depth because the depth is not the vertical distance anymore. In order to solve the problem, a method for geometric correction of 3D images [9] is adopted.

The idea behind the proposed geometric correction method is that we use a paraboloid to approximate the curved surface formed by the farthest points of the depth image mentioned previously. Once the approximating paraboloid is found, the values of the coordinates x and y of each space point are substituted into the equation of the paraboloid to correct the computed distance, as shown in Figure 3.6.

Figure 3.6 The paraboloid seen from the direction of the Y-axis (from the top view).

3.3.2 Correction Algorithm and Experimental Results

In this section, the method for finding the approximating paraboloid is described and some experimental results are shown. The criterion of minimum sum of squared errors (MSSE) is used to decide the parameters of the approximating shape which is a paraboloid. The following is the detail of this process.

First, let the equation of the paraboloid be described by:

where C is the distance between the KINECT device and the apex of the paraboloid, as shown in Figure 3.6. The equation for computing the value SSE of the SSE is:

where (xi, yi, zi) are the coordinates of a sample point. To find the coefficients A, B and (3.16), (3.17) and (3.18) become the three-variable linear equations after substituting all the values of x_i, y_i and z_i into the simultaneous equations. By solving these three

demanded paraboloid is so obtained. Finally, we show some examples of the experimental results in Figures 3.7 and 3.8.

(a) (b)

Figure 3.7 A correction result of Figure 3.5 (b). (a) Before correction. (b) After correction.

(a) (b)

Figure 3.8 A 3D image seen from above. (a) Before correction. (b) After correction.

Chapter 4 Modeling of Around-car Events

4.1 Construction of Around-car 3D Images

In this chapter, we describe the method we propose for modeling of around-car events. In Section 4.1.1, the iterative closest point (ICP) algorithm and the K-D tree algorithm [10] used in this study are reviewed. In Section 4.1.2, how the calibration work is done by using the ICP algorithm to find the geometric relationship between every two KINECT devices is presented. With the calibration information, the idea and algorithm for constructing the around-car model are proposed in Sections 4.1.3 and 4.1.4, respectively. Finally, we show some experimental results in Section 4.1.5.

4.1.1 Review of Iterative Closest Point (ICP) Algorithm and K-D Tree Algorithm

The iterative closest point (ICP) algorithm can be employed to minimize the difference between two groups of points. The concept of the algorithm is simple. It revises the transformation, including rotation and translation, from one object to the other, iteratively in order to minimize the total distance measure between the two groups of points.

The ICP is often used to match objects to compute their similarity. It is found useful for constructing 2D or 3D images from different views because object

registration or stitching, which is needed in this study, requires shape matching before the work is conducted.

A K-D tree algorithm is also employed in this study to reduce the amount of computation for searching the closest point using the ICP algorithm. The method for constructing a K-D tree, simply speaking, is to separate the group of points in concern into two parts by the median of the X, Y and Z coordinates sequentially. When searching the closest point, it takes less time to search a K-D tree rather than to search all the points. The complexity of searching becomes O(N^2/3) from O(N).

4.1.2 Calibration between Neighboring KINECT Devices Using ICP

In Section 4.1.1, it was mentioned that the ICP algorithm can be employed to minimize the difference between two groups of points. This inspired us to get the idea that we might use the ICP algorithm to “register” two objects if the two groups of points composing the objects are similar in their features. So, a method of calibration of the geometric relationship between two neighboring KINECT devices using the ICP algorithm is proposed in this study, and is described in this section.

First, a carton is put between every two neighboring KINECT devices as the calibration target. Then, depth images are acquired with these two KINECT devices, respectively. At this moment, we have two similar groups of points which come from an identical object, the carton, but appear in two different views. After the ICP algorithm is applied to register them, geometric relationship parameters, including the horizontal included angle and the displacements along the x-, y- and z-axes can be obtained. This completes the calibration of the geometric relationship between two neighboring KINECT devices. A detailed description of this process as an algorithm is

as follows.

Algorithm 4.1: calibration between two neighboring KINECT devices using the ICP algorithm.

Input: depth images I1 and I2 of a calibration target (a carton) acquired respectively with two neighboring KINECT devices D₁ and D₂.

Output: the geometric relationship parameters, a horizontal included angle i and a 3D displacement d_i, of D₁ and D₂.

Steps:

Step 1 Convert the coordinates of the depth images I₁ and I₂ into two groups G₁ and G₂ of 3D points, respectively, by the coordinate conversion scheme described in Section 3.2.1.

Step 2 Move group G₁ of 3D points iteratively through a series of transformations Ti, each including a horizontal included angle i and a 3D displacement di = (d_ix, d_iy, d_iz) along the x-, y- and z-axes, respectively.

Step 3 For each transformation Ti = (i, di), compute the Euclidean distances Djk

from each point P_j of G₁to every point P_k of G₂.

Step 4 For each point Pj, choose the minimum distance Djm from the distances Djk. Step 5 Sum up all the D_jm as the distance D_i for T_i between G₁ and G₂.

Step 6 Find the transformation Ti = (i, di) with the minimum Di as the desired relationship parameters.

4.1.3 Idea of Proposed Method for Around-car Scene Model Construction

According to the discussions in Section 3.2, we can construct a 3D partial scene model by the use of a single KINECT device. To construct the around-car 3D scene

model, a technique to use of multiple KINECT devices simultaneously is needed. And the relationship information for every two KINECT devices, which is obtained during calibration as described previously, is also needed. Under these conditions, we can start to construct the around-car scene model.

4.1.4 Construction Algorithm

In Section 2.1, we discuss how to use a desktop computer to control multiple KINECT devices to work simultaneously via a USB expansion card. As for calibration, the proposed method for acquiring the geometric relationship parameters between neighboring KINECT devices has been described in Section 4.1.2. A flowchart of the proposed around-car model construction algorithm is shown in Figure 4.1. And a detailed description of the algorithm is as follows.

Algorithm 4.2: around-car scene model construction.

Input: images I₁ through I₁₄ acquired with the 14 around-car KINECT devices, respectively.

Output: an around-car model M_car. Steps:

Step 1 Construct 3D models M₁ through M₁₄ using images I₁ through I₁₄, respectively, by the 3D image construction scheme described in Section 3.2.3.

Step 2 For i = 1~13, compute the geometric relationship parameters Ri between the 3D models M_i and M_i+1 iteratively by the calibration algorithm (Algorithm 4.1) described in Section 4.1.2, with each Ri including a horizontal angle i

and a 3D displacement d_i = (d_ix, d_iy, d_iz) along the x-, y- and z-axes, respectively.

Step 3 For i = 1~13, perform the following steps.

Figure 4.1 A flowchart of the around-car model construction algorithm.

4.1.5 Experimental Results

In this section, some experimental results of conducting Algorithm 4.2 for

around-car scene model construction are shown. At first, the comparative illustrations from Section 4.1.2 are shown in Figure 4.2. From Figure 4.2(b), it can be seen that after calibration, the 3D models of the identical cartons constructed from images acquired with two neighboring KINECT devices overlap each other well. Then, the constructed around-car model is shown from each of the three sides of the car

在文檔中運用多台KINECT對汽車周遭做立體監控 (頁 18-0)