Thesis Organization - 利用多台KINECT裝置及自動車作園區安全巡邏之研究

Chapter 1 Introduction

1.5 Thesis Organization

The remainder of this thesis is organized as follows. In Chapter 2, we describe the configuration and the operation processes of the proposed system. In Chapter 3, the proposed learning technique for use in outdoor environments is described. In Chapter 4, the proposed navigation strategy for use in outdoor environments, which includes the ideas, localization techniques, and detailed navigation algorithms, is described. In Chapter 5, the proposed methods for landmark detection and localization using 3D image data are described. In Chapter 6, the proposed techniques for following the curb line, landmark detection, and vehicle localization using depth data only are described. Some experimental results are shown in Chapter 7. Finally, Conclusions and suggestions for future works are given in Chapter 8.

Chapter 2 System Configuration and Processes

2.1 Introduction

For video surveillance, we designed a vision-based autonomous vehicle system and trained it to monitor an area of interest in a park area. In this study, we choose a path on a sidewalk in National Chiao Tung University for the training. In order to conduct security patrolling along the path on the sidewalk quickly and stably, we installed three KINECT devices on an autonomous vehicle to construct a mobile security-monitoring system for use as experimental platform in this study. Acquisition of 3D data is made easier by the use of the KINECT devices, and the use of a small and flexible vehicle is a good choice, as done in this study. Also, we need to design processes to control the systems of the KINECT devices, the vehicle, and a communication mechanism for connecting the former two systems to analyze their data. The entire system configuration, including hardware and software, is introduced in Section 2.2, and the structure of the used KINECT devices is described in Section 2.3.

Furthermore, to navigate in an unknown environment, a learning strategy is needed to “teach” the vehicle where to navigate, what to monitor, and how to adjust its locations in each navigation session. Finally, a good navigation strategy which can lead the autonomous vehicles to the goal safely also need be designed. We will describe the learning and navigation processes for the adopted vehicle and the associated principles in Section 2.4.

2.2 System Configuration

In this study, we use the Pioneer 3-DX vehicle made by MobileRobots, Inc. as a platform for our experiments. The vehicle is equipped with three KINECT devices, facing to difference directions (facing the front, left-forward, and right-forward), as shown in Fig. 2.1. The KINECT devices are new and specially designed by Mirosoft in recent years, so we would like to describe it in detail in Section 2.3, which includes the structure of the sensor and the coordinate calibration process. The hardware architecture, and the software including the application programming interfaces and development tools we use, will be described in Sections 2.2.1 and 2.2.2, respectively.

2.2.1 Hardware Configuration

The hardware architecture of the proposed autonomous vehicle system is shown in Fig. 1.1. It can be divided into three major components: the vehicle system, the KINECT-device system, and the control system. We will describe these systems, respectively, in the following.

The vehicle has an aluminum body of the size of 44cm×38cm×22cm with two 19cm-sized wheels and a caster. The vehicle can climb a 25% grade and sills of 2.5cm.

On flat floors, the vehicle can reach a forward speed of 160cm per second and a rotation speed of 300 degrees per second. Moreover, the vehicle has 16 ultrasonic sensors and three 12V rechargeable lead-acid batteries which supply the power for 18-24 hours if fully charged. The system can return its status parameters in each navigation cycle, which includes the position and orientation of the vehicle with respect to its initial pose. The system is shown in Fig. 2.1.

The second major component is the KINECT-device system which includes three KINECT devices, facing to three directions as mentioned previously, as shown in Fig.

2.2. In this study, we try to navigate the vehicle in outdoor environments quickly and stably. Therefore, we use multiple KINECT devices to reduce 3D data computation time and the hardware operation time. The structure of the KINECT-device system will be described in more detail in the next section.

Figure 2.1 The vehicle Pioneer3-DX used in this study. (a) A front view. (b) A back view.

Figure 2.2 Three different directions of the hardware configuration, which includes a vehicle and three

KINECT devices. (a) The right side. (b) The front side. (c) The left side.

Finally, in the third major component  the control system, we use a laptop computer as the control unit. It is of model R840 produced by TOSHIBA Computer, Inc. We use an RS-232 interface to connect the laptop computer to the autonomous

(a) (b) (c)

(a) (b)

vehicle, and use USB’s to connect the laptop computer to the KINECT devices.

2.2.2 Software Configuration

The MobileRobots Inc. provides an application process, called Advanced Robotics Interface for Applications (ARIA), which is an object-oriented programming interface written in the C⁺⁺ language and may be used to control the vehicle. The lowest-level data and information of the vehicle are also retrieved easily by means of the ARIA. Therefore, we can use the ARIA as an interface to communicate with the embedded system of the vehicle. Besides, in this study we use the Borland C++

builder 6.0 as the development tool to control the vehicle.

For the KINECT devices to function under the Windows system, the Microsoft Inc. provides a development tool called Kinect for Windows Software Developer Kit (SDK). We use this SDK to capture 3D images and calibrate the KINECT devices. But this SDK not only needs the operation system Windows 7, but also the two development tools of .NET Framework 4.0 and Microsoft Visual Studio 2010.

Therefore, to develop the KINECT-device system, we use the language of Microsoft Visual C++ 2010 with .NET Framework 4.0 under the Windows 7 operating system.

2.3 Structure of Microsoft KINECT Device

At first, we introduce the hardware architecture of the KINECT device. The KINECT device includes a color VGA video camera, a depth sensor, a multi-array microphone, and a tilt motor for sensor operations and adjustments. The horizontal field of view of the KINECT device is 57 degrees, the vertical field of view is 43 degrees, and the physical tilt range is ± 27 degrees. The major difference between

common cameras and the KINECT device is that the KINECT device has a depth detection sensor. The depth detection sensor is composed of an infrared projector and a monochrome Complementary Metal-Oxide Semiconductor (CMOS) sensor, which work together to obtain the distance information between the depth sensor and the objects in front of the Kinect device. The resolution of the image acquired with this color VGA video camera is 1280×960, and the resolution of the image acquired with the depth sensor is 640×480. The depth range provided by the KINECT sensor using the Kinect for Windows SDK is from 800mm to 4000mm. But the effective range of distances between the KINECT device and the user is from 1200mm to 3600mm, which is advised by the KINECT development official website. And other specifications of the hardware are not described in detail here due to the page limit.

An illustration of the KINECT device is shown in Fig.2.3.

Figure 2.3 Hardware of the KINECT device. (a) Structure of external. (b) Structure of internal.

Furthermore, calibration of the camera parameters before vehicle navigation is necessary. In this process, and a rotation problem and a shifting one will arise in the 3D space coordinates. A solution to the rotation problem is to calibrate the related parameters before vehicle navigation for each navigation session. The KINECT device can be calibrated with some calibration functions and parameters provided in

Microphone array

the Kinect-for-Windows SDK. In this study, we tilt the field of view of each KINECT device to the zero-angle position using the tilt motor in the sensor before vehicle navigation. And to solve the shifting problem between the color image and the depth image, we use certain functions provided by the Kinect-for-Windows SDK [19].

After the above two problems are solved, we can obtain a 3D image in which the original color and depth images are in the same image coordinate system. But these 3D image data are just the 3D depth coordinates combined with the 2D image coordinates, so they must be transformed into the 3D space coordinate system integrally. For this purpose, we apply the principle of the pinhole camera model to conduct the conversion of the 3D image coordinates into the 3D space coordinates.

As shown in Figure 2.4, a space point G at coordinates (X, Y, Z) in the 3D space is projected through the lens center of the camera onto the image plane, where the image plane may be the depth image or the color image. The depth value d is provided by the KINECT device, but we do not have its correct coordinates in the 3D space. Therefore, we compute the direction vector of the image plane to the lens center by using the focal length f of the depth image provided by the KINECT-for-Windows SDK [19] and image coordinates (u, v). Then, we can calculate the correct 3D space coordinates (X, Y, Z) of point according to the similar-triangle principle using the depth value D. Specifically, by the principle and following the direction vector starting from the image plane, going through the lens center of the camera, and projects finally onto the 3D space point G, we can compute the 3D space coordinates (X, Y, Z) as follows.

At first, apparently as can seen from Fig. 2.4, we can calculate the distance d between the image plane and the lens center by the following equation:

2 2 2

d u v  f , (2.1)

then, according to the similar-triangle principle, since the two triangles OCI and GG'O are similar, we can know the following equalities:

X Y Z D

u  v  f  d , (2.2)

from which we can derive the following equations to describe the relation between the image coordinates (u, v) and the corresponding space coordinates (X, Y, Z):

2D u2 2 ; 2 D v2 2 ; 2D 2f 2 .

Figure 2.4 A pinhole camera model.

2.4 System Processes

2.4.1 Learning Process

To conduct security patrolling in an outdoor environment, a learning process is necessary. We describe the information which the vehicle should record in this process in detail now. At first, we bring the vehicle to a selected path in an outdoor environment, which is a part of the National Chiao Tung University campus. Because the goal is security patrolling, we use the vehicle to patrol along a path on a sidewalk in that part of the campus. Furthermore, we propose a “curb line following” technique in the proposed system for vehicle guidance. Finally, the environment information and camera parameters are recorded at difference positions on the path. The entire learning process is shown in Fig. 2.3.

In order to help users to guide the vehicle, a user interface has been designed for controlling the vehicle and selecting landmarks to be learned. Specifically, via the interface, the user controls the vehicle to navigate on the sidewalk, and move to an appropriate position with respect to each pre-selected landmark. Then, the features of the landmark are extracted from the 3D images acquired by one of the three KINECT devices using an SURF extraction algorithm. And the relative position between the vehicle and the landmark is computed by use of the depth image. Also, relevant information, including the camera number, the distance to the curb, the region of the detection window, and the vehicle parameters (the odometer readings), is recorded in the meantime.

Finally, we classify the recorded data into two categories, path-dependent data and landmark-dependent data. As soon as the learning process ends, the learned information is organized into a navigation path which is composed of several path

nodes with guidance parameters. All of the data are stored in the storages of the computer so that it can be modified and used repeatedly.

2.4.2 Navigation Process

Before the vehicle starts to navigate, the system reads the path and environment information created in the learning process as mentioned previously. In order to guide the vehicle to navigate along the learned path, the vehicle is instructed to move from a node to the next sequentially according to the learned path. A flowchart of the proposed vehicle navigation process is shown in Fig. 2.4.

In more detail, when the vehicle is navigating to the next node, it checks the navigation mode at first to ensure whether it has to detect the curb line and followed it.

If the curb-line detection-and-following process fails, the system will enter the blind navigation mode and reconfirm the navigation mode in the next loop. Also, the navigation process detects the target landmark continually until the correct landmark appears in the omni-image. When the vehicle navigates to the desired node successfully, it can obtain the navigation information of the next node from the learned path kept in the system.

In addition, when the navigation process finds the target landmark successfully, the vehicle will adjust its position and load the relevant parameters for navigation to the next node. However, some nodes provide the navigation information only, which we call “tuning nodes.” This kind of node can help the vehicle to navigate to the terminal node successfully.

Figure 2.5 Flowchart of proposed learning process.

Vehicle Localization procedure Loop of navigation process

Start Navigation

Figure 2.6 Flowchart of proposed navigation process.

Chapter 3 Learning of Outdoor Environment Features

3.1 Introduction

In order to use an autonomous vehicle to navigate in an outdoor environment, building complete path information to guide the vehicle is necessary. Therefore, creating a path map and selecting appropriate landmarks is a primary work for successful security patrolling by vehicle navigation. In this chapter, we will introduce our ideas of selection of landmarks and learning of guidance parameters in outdoor environments. Some coordinate systems, including the image coordinate system, the camera coordinate system, the vehicle coordinate system, and the global coordinate system, will be defined in Section 3.2. In addition, the learning techniques and strategies will be described in Section 3.3. At last, a detailed algorithm describing the learning process will be described in Section 3.4.

3.1.1 Selection of Sequential Landmarks for Learning

When we conduct the vehicle navigation process, mechanic errors will accumulate to affect to the readings of the odometer about the vehicle location and orientation. To solve such problems, we adopt an approach of “vehicle localization using landmarks.” For this purpose, some objects should be selected as landmarks at first to conduct the vehicle localization task. In this study, we select some objects sequentially along the pre-selected path as landmarks. Because of this characteristic of

sequential selection, we can estimate the position of the vehicle on the sidewalk approximately without having to depend on using the odometer readings excessively.

The main types of selected landmarks for localization in this study include light pole, hydrant, and tree trunk. Two other types of landmarks, namely, ramp and curb, which provide environment parameters for vehicle guidance are also selected.

With more and more categories of landmarks selected, we can utilize more information along the path for vehicle localization to reduce the chance of getting astray or falling out off the sidewalk, and guide the autonomous vehicle to the terminal point more reliably as well. The proposed methods of vehicle localization using landmarks will be described in detail later in Chapters 5 and 6.

3.1.2 Idea of Learning Guidance Parameters and Landmark Features in Outdoor Environments

To navigate in an unknown outdoor environment, some kinds of environment parameters or features should be learned for use in the navigation stage. The first feature learned in the proposed system is navigation path data. We can obtain the position of the vehicle by the odometer reading, but the mechanic errors usually cause imprecise readings of the vehicle location. Therefore, it becomes an important task to correct the position of the vehicle and the odometer reading. In Section 3.3.2, we will describe how to collect path data for vehicle localization by controlling the vehicle to navigate along a pre-selected path in an outdoor environment.

The features to be learned next are some camera and vehicle guidance parameters. Part of the parameters need manual measurements and are taken as inputs to the process of learning other features, and we refer to such types of feature data as

“prior knowledge.” More details of such parameters for learning will be introduced in

Sections 3.3.2 and 3.3.3.

The last feature to be learned is landmark. In order to use the landmarks to conduct vehicle localization, “training” the vehicle to “know” what to detect and how to recognize landmarks are necessary. That is, the vehicle must learn what features about each selected landmark should be detected, and then, it should be able to recognize each landmark by matching its features against those computed in the navigation phase. For this purpose, we adopt in this study a powerful approach  using the SURF [20]  to extract such features from selected-landmark images. In the mean time, we also record the vehicle location with respect to each selected landmark in terms of depth data. The detailed learning process is described in Section 3.3.4.

3.2 Coordinate Systems

In this section, we will introduce the coordinate systems used in this study, which describe the relations between the used devices and the selected landmarks in the navigation environment. The coordinate systems are illustrated in Figure 3.4 and defined in the following.

1. Image coordinate system (ICS): denoted as (u, v). the u-v plane coincides with the image plane and the origin O_I of the ICS is located at the center of the image plane.

2. Vehicle coordinate system (VCS) denoted as (V_X, V_Y): the origin O_V of the vehicle coordinate system is located at the center of the vehicle, and the VX-VY plane coincides with the image plane.

3. Camera coordinate system (CCS), denoted as (X, Y, Z): the origin OC of the CCS is located at the lens center of the KINECT device, the X-Z plane is parallel to the

ground, and the Y-axis is perpendicular to the ground.

4. Global coordinate system (GCS) denoted as (GX, GY): the origin OG of this system is always placed at the start position of the vehicle in the navigation path, and the GX-GY plane coincides with the ground.

When we conduct the vehicle in the navigation phase, we have to know the relationships among the coordinate systems. At the beginning of each navigation process, the VCS and CCS follow the vehicle and the VCS coincides with the GCS.

The coordinate systems are illustrated in Fig. 3.1.

Color and depth image

In this study, we use three KINECT devices equipped on the vehicle to sense the environment. When we bring the vehicle to a certain place where is a path node, the proposed system records the position data in the 3D space of the vehicle with respect to the selected landmarks. Then, when the vehicle moves on the path in the navigation

session, we can adjust the vehicle location according to the learned position at the currently-visited path node.

3.3 Learning of Outdoor Guidance Parameters and Landmark

Features

3.3.1 Learning of Outdoor Guidance Parameters

For the vehicle to navigate in an outdoor environment, a trainer of the proposed vehicle system should guide the system to learn and record parameters or features of the environment. The parameters to be learned in this study include depth data, landmark feature, detection window, KINECT device number, and some other ground-truth parameters. The proposed techniques for learning these environment parameters are described in the following.

3.3.2 Learning of Navigation Paths Composed of Nodes

In general, the vehicle navigates in an outdoor environment under the control of

在文檔中利用多台KINECT裝置及自動車作園區安全巡邏之研究 (頁 23-0)