Learning of Landmark Detection and Ground-truth Parameters . 29

Chapter 3 Learning of Outdoor Environment Features

3.3 Learning of Outdoor Guidance Parameters and Landmark Features

3.3.3 Learning of Landmark Detection and Ground-truth Parameters . 29

In this study, we use a technique of line following to navigate the vehicle along a path on the sidewalk which has a curb line along the path. Therefore, we can find that a line-segment landmark is usually projected in a fixed region in the image. For this characteristic, we only need to detect a part of the region in the image to reduce the computation time. Accordingly, we can define a region of interest (ROI) in the image as shown in Fig. 3.3, which is also called a detection window.

By this property, we also record which KINECT device is used to detect a certain landmark along the path. The KINECT devices are labeled with a serial number as shown in Fig. 3.4. When the vehicle moves on in the navigation stage, the recorded serial number in a path node can be retrieved to decide which KINECT device should be used to detect the target landmark continuously until the target landmark is detected. This means that the computation load in the navigation process is considerable. But after relevant parameters are learned, we can handle less data acquired by the specified KINECT device, and so do not have to use more than one KINECT device to detect the landmark at the same time unless we want. In this way, we can speed up the computation and so increase the navigation speed.

In addition, some ground-truth data are measured in the learning process, such as the angle of any ramp and the distance of the vehicle to the curb along the sidewalk.

We will describe in Chapter 6 how these parameters are used in this study.

3.3.4 Learning of Landmark Features in Color and Depth Images

In order to learn selected landmarks, we design a user interface to help users to

specify the landmark which they want to use. While a user controls the vehicle to a position beside a landmark to be learned, he/she can select one of the KINECT devices to acquire the color and depth images, and then drag manually a rectangle as an ROI to segment out the landmark which appears in the color image. Next, a SURF extraction algorithm [20], which is described in Chapter 5 in detail, is applied to obtain the feature set of the ROI. Then, the depth data provided by the KINECT device, the feature set of the landmark, the KINECT device number, and the ROI are saved into the learned data set. A flowchart is illustrated in Fig. 3.5, and the details of the process are described in the following as an algorithm.

Figure 3.3 Curb line in the detection window. (a) Color image. (b) Depth image.

Vehicle

KINECT device No . 1

No . 2

No . 3

Figure 3.4 An illustration of KINECT device numbers.

(a) (b)

Figure 3.5 A flowchart of the landmark learning process.

Algorithm 3.2 . Learning of a selected landmark.

Input: the position P of a selected landmark M.

Output: information data of landmark M.

Steps.

Step 1. Control the vehicle to position P beside the landmark M.

Step 2. Select one of the KINECT device as specified in the path to acquire a color image I and a depth image D.

Step 3. Drag a rectangle on image I as an ROI R.

Step 4. Apply the SURF extraction algorithm on the ROI to extract a feature set S.

Step 5. Save the depth image D, the KINECT device number, the feature set S, and the ROI R manually in the record of the current path node corresponding to landmark M.

In this study, gray-level depth images composed of depth data provided by the KINECT device are used as inputs to the SURF extraction algorithm. An example of such depth images is shown in Fig. 3.6. Actually, the above algorithm of learning of a landmark is not suitable for such a type of depth image because the feature points in a gray-level image are much less than those in a color image. However, our experimental experience of using the depth image to extract SURF’s for landmark localization shows that the effect of using the depth image alone is acceptable. More detailed experimental results and vehicle navigation schemes will be described in Chapter 6.

Figure 3.6 A hydrant landmark in a depth image.

Chapter 4 Navigation in Outdoor Environments

4.1 Introduction

When the learning process is finished, we can obtain the learned environment information, including a set of landmark features, ground-truth data, images of ROI, and a navigation path. In this chapter, we introduce our idea for vehicle navigation by this information in outdoor environments, and describe how we implement them.

Some strategies for conducting the navigation work will be described in Section 4.2.1.

In Section 4.3, the detailed algorithm for the proposed navigation process will be introduced after two main ideas to guide the vehicle to navigate on the learned path are described.

4.1.1 Strategy of Vehicle Guidance on Learned Paths

In the task of vehicle navigation, a navigation path like that shown Fig. 3.2 is established in advance. There are a starting point and an end one in the path, and also some spots of interest to us that the vehicle will go through between the starting point and the end one. In this study, we have chosen a starting point and an end one on an interesting path in a part of the sidewalk in National Chiao Tung University as our experimental environment, and record the features and positions of some pre-selected landmarks along the path. We have also “learned” some environment parameters, like the speed of the vehicle, the angle of each path turning, and the ground-truth data of a ramp and a curb segment, to assist the vehicle to navigate along the path successfully, as shown in Fig. 4.1. When the above-mentioned tasks are finished, the vehicle will

be said to be able to navigate along the learned path.

However, besides guiding the vehicle to learn the above-mentioned parameters, a vehicle navigation strategy is also important in this study. The strategy proposed in this study to conduct the navigation work is introduced in Section 4.2. The detailed algorithm for the proposed navigation process is introduced in Section 4.3.

(a) (b)

Figure 4.1Two types of landmarks selected for use in this study. (a) Curb line. (b) Ramp.

4.1.2 Localization by Sequential Landmarks

As mentioned previously, the vehicle navigation process usually generates mechanic errors, resulting in imprecise computations of vehicle positions. To solve the problem, a strategy adopted in this study is to guide the vehicle to constantly localize its position based on the sequentially learned landmarks. Specifically after detecting and localizing a landmark in the acquired KINECT images by the use of the proposed methods (introduced later in Chapter 5) and obtaining the relative vehicle position with respect to the landmark, we can adjust the vehicle’ position and orientation to the status as that learned in the learning phase at the current spot.

In addition, because the learned path is along a sidewalk and we use the concept of sequential-node visiting to conduct vehicle localization, the use of a curb line

feature on the sidewalk is practical in this study. We use the learned curb-line parameter to achieve line following to correct the vehicle’s orientation for navigation along the learned path on the sidewalk.

4.2 Proposed Navigation Process

4.2.1 Strategies for Proposed Navigation Process

In this section, we introduce the strategies proposed in this study for vehicle navigation on the learned path. At first, the navigation process reads a learned navigation path and related guidance parameters which were recorded in the storage of the laptop computer. The navigation path consists of several nodes which were labeled in a sequential order in the learning process. The vehicle is guided according to the concept of sequential-node visiting to visit each node sequentially to conduct vehicle localization. Some strategies are proposed for use to guide the vehicle to navigate to the pre-selected destination successfully. They are described as follows.

1. The vehicle always follows the curb line on the sidewalk if possible. After detecting the curb line, the vehicle modifies its orientation to keep a safe distance with respect to the curb line on the sidewalk.

2. The vehicle localizes its position according to the learned sequential landmarks along the path. We adjust the vehicle’s position in the GCS according to the learned landmark position and the current landmark position which are computed using the acquired images at the vehicle’s current location.

3. An object detection process is conducted continuously to detect objects around.

When an object of suspicion appears in the detection window, the vehicle will stop going forward, and match it against the recorded landmark.

By the above strategies, the vehicle can be expected to navigate to the desired destination successfully. A flowchart in accordance with the above three strategies is shown in Fig. 4.2.

4.2.2 Idea of Vehicle Localization by Learned Sequential Landmarks

Although the odometer readings provide the vehicle’s position and direction for vehicle navigation in the navigation phase, they are usually imprecise to guide the vehicle to the next position correctly. Therefore, using the learned landmarks, which include light poles, hydrants, sidewalk curb lines, and tree trunks in this study, to localize the vehicle’ position and orientation becomes the main task, as shown in Fig.

4.3. The sequential landmarks and the characteristic of the curb along a path on a sidewalk can be used to obtain the vehicle position on the path, and the learned odometer readings can assist judging whether the vehicle has arrived at a correct position or not to detect the landmark. The process of vehicle localization is illustrated in Fig. 4.4. Two different positions of the vehicle at a node in the navigation path and the relation between the vehicle, the curb, and the light pole are illustrated in Fig. 4.5.

The proposed vehicle localization technique consists of two major steps. Firstly, an object detection process detects the existence of the next-to-visit landmark continuously after the start of the navigation process. When the detection process detects a landmark at a correct node, we can acquire the landmark’s depth data by the KINECT device with respect to the vehicle. Second, from the learned environment parameters, we can obtain the recorded depth data, and then we match, using the ICP algorithm, the two different sets of depth data to estimate the correct position and orientation of the vehicle according to the MSE criterion as illustrated in Fig. 4.6,

resulting in a set of 3D space parameters, including a pair of translation parameters (X_mse, Z_mse) and a rotation angle mse in the CCS. The adopted technique to adjust the vehicle to a correct position is described in the following algorithm.

Initialize The path

Figure 4.2Flowchart of navigation process.

Algorithm 4.1 Vehicle localization and position adjustment by learned landmarks.

Input: a color/depth image and a recoded landmark depth data D.

Output: None.

Step.

Step 1. Use the SURF extraction algorithm (described in Chapter 5) to recognize a learned landmark from the input color/depth image; and if an object of the learned landmark recognized successfully by system, take out the

corresponding node data from the learned path information; else, go to Step 1.

Step 2. Obtain at the vehicle’s current position new depth data D' of the landmark by one of the three KINECT devices as specified in the learned path coordinates and the recorded one in the path data in the CCS using the ICP technique (the detailed method will be described in Chapter 5).

Step 5. Convert the coordinates (X, Y, Z) in the CCS into the coordinates (VX, VY) in the VCS by the following way:

VX X ; V_Y Z . (4.1)

At first, we define a region as the detection window and a threshold value thr for detecting landmark objects in the depth image, which are selected in the learning stage. Then, after the navigation process is started, the detection process will detect a region of detection window in the acquired depth image and decide whether there exists any object of concern. The criterion for this decision is to check if the distance between the detected object with respect to the vehicle is smaller than a pre-selected threshold thr. If this condition is satisfied, the SURF extraction algorithm is then

applied to extract the object’s feature points in the color image, which then are matched with the learned feature set to recognize the learned landmark. The detection process is illustrated in Fig. 4.7.

(a) (b) (c)

Figure 4.3Three types of landmarks selected for vehicle localization in this study. (a) Light pole. (b)

Hydrant. (c) Curb line.

Figure 4.4 The vehicle localization process.

Curb line

Learned Position of Vehicle Current Position of

Vehicle

GCS

VCS

Light Pole

Figure 4.5 Illustration of learned position of the vehicle and current position of the vehicle in the GCS.

VCS

L(l_X, l_Y, l_)

GCS

VCS

Current feature position Recorded feature position

L¢(l¢X, l¢Y, l¢)

(a) (b)

Figure 4.6 The depth data of light pole recorded at position L are matched with newly-acquired depth data in navigation process at position L¢(a) A recorded feature position with respect to the vehicle. (b) A current feature position with respect to the vehicle.

4.3 Algorithm of Navigation in Outdoor Environments

In this section, we describe the detailed process for vehicle navigation in the

outdoor environment. With the learned information, the vehicle navigates along the learned path by the concept of sequential-node visiting to visit each recorded node consecutively and conducts specified works at the learned positions until reaching the end point of the learned path. The entire navigation process is described in the following algorithm, and a flowchart of the complete navigation process is shown in Fig. 4.8.

Figure 4.7 Flowchart of proposed detection process.

Algorithm 4.2 Navigation Process.

Input: a learned navigation path Npath with relevant guidance parameters, and learned data of environment parameters.

Output: none.

Step.

Step 1. Choose a start node Nstart and an end node Nend from the learned navigation

path N_path, and initialize vehicle navigation from N_start.

Step 2. Read from Npath a navigation node Nnext and related guidance parameters.

Step 3. Move the vehicle forward to node N_next and detect the learned landmark.

Step 4. If a sidewalk following mode is adopted, detect the curb line by the curb line detection process (the detailed method will be described in Chapter 6).

If successful, modify the vehicle direction accordingly; otherwise, conduct the vehicle in the blind navigation mode.

Step 5. If the detection process detects an object of concern in the detection window and its distance with respect to the vehicle is smaller than a threshold thr, then stop the vehicle and go to the next step; else, go to Step 7.

Step 6. If there exist a light pole or hydrant landmark in the current node Nnext, capture a color/depth image by KINECT device, use the color/depth image and the learned landmark depth data D as inputs, perform the algorithm 4.1 to do vehicle localization, and then go to Step 2.

Step 7. If there exists a ramp landmark in the current node Nnext, adopt the blind navigation mode, adjust the vehicle direction, and then go to Step 2.

Step 8. If there exists a tree trunk landmark in the current node Nnext, compute the position of the landmark center in the image in terms of 3D space coordinates, use the coordinates to localize the vehicle, and then go to Step 2.

Step 9. If there exists a landmark which is pre-selected as the terminal node Nend

(recognized to be so by its landmark type and its number), stop the vehicle, and finish the navigation.

Step 10. Repeat Steps 3 through 9.

Figure 4.8Flowchart of detailed proposed navigation process.

Chapter 5 Landmark Detection and

Localization Using Depth and Color Images

5.1 Introduction

Vehicle localization is an important task for building the autonomous vehicle navigation system in this study. It can guide the vehicle move to a pre-selected destination successfully. For this purpose, we use pre-selected landmarks to provide the current vehicle position in the learned map when navigating. However, to decide which landmarks should be used is also an issue. Utilizing the characteristics of the KINECT device which can provide 3D space by depth images, we select objects which have prominent 3D shape information as landmarks for localization, as illustrated in Fig. 5.1. We choose rectangular-like objects as landmarks as Fig. 5.1(a), because it can provide translation and rotation information simultaneously. A method for feature extraction and matching for recognizing the landmark is described in Section 5.2. Unfortunately, not all of the landmarks on the learned path can provide 3D information. Therefore, some other techniques of object detection and localization will be introduced in Chapter 6. In addition, how we convert depth images of landmarks into 3D space coordinates to localize the vehicle will be described in Section 5.3. And a series of algorithms for landmark detection and localization will be described in details in Section 5.4.

Figure 5.1 The top views of three difference types of objects in the depth image which is captured from the front of the KINECT device. (a) A rectangle. (b) A plane. (c) A cylinder.

5.2 Review of Method of Matching by Speeded Up Robust Features

(SURFs)

The SURF extraction method proposed by Herbert Bay et al. [20] in 2006 includes four major stages of computation to generate a set of features, and part of the idea is based on the similar concept of the SIFT [21]. In this section, we will give a brief review of the SURF, which is divided into two parts as follows: detection of feature points of interest and description and matching of such points.

5.2.1 Detection of Feature Points of Interest

A main difference between the SURF and the SIFT is that the SURF is based on the use of Hessian matrix approximation and integral images, which reduce the computation time drastically because the integral image allows fast computation of box-type convolution filters and the Hessian matrix has a good performance in accuracy.

In more detail, the theory of the SURF one pixel in an image Ix, y) can be

(a) (b) (c)

represented by a Hessian matrix as follows:

2 2

and using the convolutions of the Gaussian second-order derivatives, the Hessian matrix H(x, ) at x at scale  is defined as follows: with of the image I at point x, and Lxy(x, ) and Lyy(x, ) are interpreted similarly.

In the choice of the filter, the author thinks that the filters are non-ideal in any case, so he chose to approximate the Hessian matrix with box filters. The 9×9 box filters in Fig. 5.2 for computing the blob response maps are denoted by Dxx, Dyy, and Dxy. Therefore, the determinant of the Hessian matrix can be written as:

( _approx) _xx _yy (0.9 _xy)2

Det H D D  D . (5.3)

Figure 5.2 Left to right: The SURF used the approximation of the second-order Gaussian partial derivative in the y-direction (Dyy) and the xy-direction (Dxy). The grey regions are equal to zero.

The scale spaces are usually implemented as an image pyramid. The images are repeatedly smoothed with a Gaussian filter, and then sub-sampling in order to achieve

a high level of the pyramid. In the SIFT, the author subtracts these pyramid layers in order to get the DoG (Difference of Gaussians) images. In the SURF, the scale space s is analyzed by up-scaling the filter size rather than iteratively reducing the image size as shown in Fig. 5.3. Therefore, the SURF can reduce the sampling time to speed up the overall computation time.

(a) (b)

Figure 5.3 Illustration of SIFT and SURF. (a) Iteratively reducing the image size. (b) According to the scale space s to up-scaling the filter size.

A similar technique of the SIFT to localize the points of interest in the image is also used in the SURF extraction algorithm. Each point in the images is compared with its 8 neighbors in the same scale image, and the 9 corresponding neighbors in neighboring scale images, as shown in Fig. 5.4. If the point is a local maximum, it is selected as a candidate feature point. And the found candidate points of the determinant of the Hessian matrix are computed by 3D linear interpolation in the scale and image space.

Figure 5.4 Maxima values are detected by comparing a pixel, as marked with X, with its 26 neighbors,

在文檔中利用多台KINECT裝置及自動車作園區安全巡邏之研究 (頁 42-0)