S UMMARY

CHAPTER 2. PROBABILITY-BASED LOCATION AWARE DESIGN

2.4 S UMMARY

This chapter presents a novel method for localizing a ZigBee node in a WSN. The probability-based approach takes the uncertainties of RSSI-based distance estimation into account. The method is independent of the sensor node distribution and environment configuration and therefore can minimize the calibration burden in practical applications. The performance of the proposed algorithm has been evaluated by experiments in three different environments. The experimental results show that the average error of the implemented location aware system is 1.7 m, despite a much higher range uncertainties from raw RSSI values. This accuracy can effectively support service robotic applications, especially when properly combined with a vision system. For instance, the localization system can provide a possible area of a user calling the robot for service or when an emergency situation occurs.

Table 2-1: Configurations of Test Environments

Test Environment Type Test point # Area/m² Beacon #

EE1 Room 26 49 8

EE2 Corridor 21 30 6

EE3 Hallway 60 600 8

(a)

(b)

Fig. 2-6: (a) Experimental environment 1 (EE1): the lab 622 and corridor (b) the location of deployed ZigBee beacons in the experiment (filled circles).

Table 2-2: Estimation Error Comparision: EE1

Method Average Error (m) Max Error (m) Standard Deviation

Min-Max 2.29 3.4 0.77

(a)

(b)

Fig. 2-7: (a) Experiment Environment 2 (EE2): the corridor of lab building, (b) the location of deployed ZigBee beacons in the experiment (filled circles).

Table 2-3: Estimation Error Comparision: EE2

Method Average Error (m) Max Error (m) Standard Deviation

Min-Max 2.46 3.2 1.14

ROCRSSI 2.76 3.1 1.22

CC2431 1.82 3.5 1.5

Proposed

(2D Map) 1.32 3 0.73

Proposed

(with Filtered Map) 1.21 3.2 0.92

(0,0) (0,4)

(4,4)

(8,4) (8,0)

(4,0)

(a)

(b)

Fig. 2-8: (a) Experiment Environment 3 (EE3): the lobby of the lab building (b) the location of deployed ZigBee beacons in the experiment (filled circles).

Table 2-4: Estimation Error Comparision: EE3

Method Average Error (m) Max Error (m) Standard Deviation

Min-Max 3.7 6.3 1.29

ROCRSSI 3.1 5.2 1.61

CC2431 2.2 7.58 1.74

Proposed

(2D Map) 1.5 4.2 0.89

Proposed

(with Filtered Map) 1.36 3.21 0.73

Chapter 3. Monocular Visual Navigation Design

3.1 Introduction

Obstacle detection and avoidance are essential for mobile robot navigation. This chapter proposed a monocular vision-based obstacle avoidance design for mobile robot navigation. A novel image processing procedure is developed to estimate the distance between the robot and obstacles based-on inverse perspective transformation (IPT) in an image plane. A robust image processing solution is proposed to detect and segment a drivable ground area within the camera view. The proposed method integrates robust feature matching with adaptive color segmentation for plane estimation and tracking to cope with variations in illumination and camera view. After IPT and ground region segmentation, distance measurement results are obtained similar to those of a laser range finder for mobile robot obstacle avoidance and navigation. The merit of this algorithm is that the mobile robot can have the capacity of path finding and obstacle avoidance by using a single monocular camera. The rest of this chapter is organized as follows: The concept of the system is described in section 3.2. Section 3.3 goes through the detailed algorithm of ground plane and obstacle detection. The calibration experiment of distance measurement is presented in section 3.4. Section 3.5 shows the simulation result using synthetic images. Experimental results using mobile robot will be reported in Chapter 5.

3.2 System Overview

In order to estimate range information for mobile robot navigation control using merely monocular image sequence, the basic idea is to find the world coordinates of pixels belongs to the ground plane from the image plane by applying IPT. For common indoor environments, we can make the following assumptions: (1) the ground is a plane; (2) the robot moves parallel to the ground; and (3) the camera is fixed on the robot. The distance between the camera and the ground plane is thus fixed and the IPT of the ground plane can be determined beforehand. With the pinhole camera model, applying IPT can be simplified by using plane-to-plane homography [38], as shown in Fig. 3-1. There exists a 3 by 3 homography matrix H such that



p  q  ,

 H

_ _{}

where p

is the coordinate of the ground plane relative to the camera, and q

the coordinate of image plane. The homography matrix H can be found from the relative coordinates of four points on the ground plane and their corresponding coordinates in the image. In the calibration phase, several sets of corresponding points are used to estimate a precise H. After calibration,

Fig. 3-1: Homography of coplanar points between image plane and ground plane.

p 

C’ ^I

H

H ^-1

q

the projected images can then provide the relative position of each pixel on the ground in world coordinates [33,34]. Off-ground objects, on the other hand, will be distorted after the transformation. A recalibration will be necessary if the position or angle of the camera is changed.

Due to the assumptions, IPT can only map pixels on the ground plane. In practical obstacle avoidance applications, pixels which belong to the ground plane must be found on the fly in order to determine distance to obstacles. The proposed system therefore aims to solve this problem via multiple visual clues. Fig. 3-2 illustrates the architecture of the proposed monocular visual navigation system (MonoVNS). In this design, two parallel procedures are launched simultaneously after the image preprocessing step. In the ground feature detection process, the system extracts all the features in current image at time step t, and matches them to features observed in previous N frames. N is determined by the frame number that can be processed in one second. According to the correspondence and pre-determined ground features, ground homography and features on the ground in the newly acquired image can be found. The system then uses the homography matrices among consecutive frames to classify all other features as on the ground or not. Meanwhile, in a parallel process, images are segmented into patches according to their color homogeneity. These segments are then classified into ground or off-ground regions by comparing with warped images. The detailed algorithm is described in Section 3.3.

Once the ground and off-ground areas are segmented, the coordinates of the ground area (relative to the camera) can be estimated by applying IPT to the acquired images. The images are virtually rectified with a downward-looking pose. The range information can then be obtained by a ray-casting operation. Ray casting is a procedure to mimic a laser range finder (LRF) using the rectified image. A set of virtual rays are casted from the camera center, one per pixel, until the pixel belongs to an off-ground area, as shown in Fig. 3-3. The distance of a

Fig. 3-2: The architecture of the proposed MonoVNS.

Robust Feature ExtractionFeature Matching

Ground Feature Detection Ground Plane Region Segmentaion

Robust Homography Estimation Inverse Perspective Transformation

Image SegmentationSegment comparisonGround/non-ground segment classification Ray Casting

Obstacle Detection Robot Navigation Controller

Ground/non-ground region labeled image

Image segments

SURF points Ground / Non-ground features Image segments Range dataGrid map Matched frames and feature pairs

Ground / Non-ground features Classification result

Monocular Camera Image sequence

ray to an object can thus be estimated by the two-norm of the pixel. The distances of these rays now represent the distances to obstacles, just like a typical LRF, but with a smaller view angle.

This information can be applied to any LRF-based obstacle avoidance scheme, and make it to work in a similar manner.

3.3 Ground Plane and Obstacle Detection 3.3.1 Ground Feature Detection

In order to obtain robust matching results of corresponding points, the feature extractor should have properties such as invariance to rotation, scale, illumination and image noise. In previous related works, Harris and FAST [39] corner detectors have been widely adopted [28-31]. The robustness of these descriptors is relatively limited in practical applications. In last decade, there have been many advances in scale/rotation-invariant feature extractors, such as Scale-Invariant Feature Transform (SIFT) [40]. A An array of image gradient histograms has been used as a descriptor instead of a raw image patches. However, the heavy computation

Fig. 3-3: An example of ray casting in an image after IPT. The white area indicates the ground region. The dark area indicates obstacles. The distances between the camera center

and obstacles can be estimated by measuring the length of the rays.

Camera center Ray

load of SIFT algorithm makes it unsuitable for real-time navigation applications. In this work, a combination of robustness and execution speed has been investigated. In the current design, the FAST corner detector is first applied to find interesting feature points. These points will be further described with Speeded Up Robust Features (SURF) [41] for its superior robustness and execution speed. FAST provides a high speed corner detection performance, which can reach full-frame feature detection up to 400Hz. While the robustness can be preserved, SURF is about three times faster than SIFT in feature extraction step. SURF also provides an indexing value for fast matching purpose. Nevertheless, these methods may still fail under excessive motion blur or abrupt changes in view angle. The speed of the robot and therefore the camera motion is limited accordingly in this study.

Once features are found and matched, the next step is to determine whether these feature points are on the ground. Consider that a ground plane is projected into two views taken at different positions as shown in Fig. 3-4. With the pinhole camera model, two views of the same plane are related by a unique homography²⁰. That is, for a plane = [n^T 1]^T, a ray corresponding to a point p in image I and its corresponding point p’ in image I’ meet at point

Fig. 3-4: Homography of coplanar points observed from two images.

p p’



C’ C

I’ I

P



P in plane . Therefore, if a set of coplanar points pi with homogeneous image coordinates (xi, yi, 1)^T are found, and their correspondences {pi  p'i} in two images are also found, there exists a 3 by 3 homography matrix H such that

 pⁱpⁱ Hpⁱ 

where K is the intrinsic camera parameter matrix, R is the rotation matrix, t is the translation vector,  is the scale factor, and d is the distance between the plane and the camera. To determine H, four non-degenerated corresponding points are required since each point correspondence provides two independent constraints. In the proposed system, the features on the ground are initially determined, i.e., a subset of p_i is known as on the ground plane. The homography of the ground plane can thus be determined initially. To further reduce possible matching errors, we apply RANdom SAmple Consensus (RANSAC) to eliminate outliers and robustly determine H [42-43]. Note that the H matrix here is only relevant to find a robust plane relationship between two frames. It does not affect the homography matrix applied in IPT computation. Once the homography matrix is determined, the rest of the corresponding points can also be determined if it is on the ground by using the back projection technique such that: classification should be. Since the mobile robot can run over certain small obstacles, there is a small range of tolerance for the feature classification. This facilitates the mobile robot to navigate on somewhat uneven surfaces. A fixed value is determined beforehand in the current

implementation. Fig. 3-5 illustrates an example of ground feature classification from two images. Note that homography estimation has its limitation. For instance, in case of near-zero camera translation, i.e., t 0 in (3), no information on coplanarity can be inferred since the plane normal n can be arbitrary [30]. This condition must be checked since in this case all the points will be determined as ground. A simple test is to determine whether all observed points conform to the same homography. This test will fail only when there is just a single scene plane visible, or when camera intrinsic parameters are changed. Both of these cases can be

(a) (b)

(c)

Fig. 3-5: An example of ground feature detection. (a) and (b) are images taken in different views. (c) is the classification result of the matched features observed from

(a) and (b). The features with blue label indicate features on the ground, while the features with orange label are features off the ground.

avoided in the current design.

3.3.2 Ground Region Segmentation

In the Section 3.3.1 feature points which belong to the ground plane are determined.

Pixels that are not selected as feature points, however, are still needed to be classified if they are on the ground. In this work, this problem is resolved by segmenting the image into regions beforehand and classifying these regions on the ground afterwards.

The quality of segmentation depends not only on the algorithm used, but also the targeted application. In this design, the segmentation method should be able to distinguish objects from image frames in a fast and robust way under various environmental conditions. In the current design, an image frame is segmented by adopting multi-scale Mean-shift algorithm in HSI (Hue-Saturation-Intensity) color space. Mean-shift algorithm [44] is a nonparametric, iterative clustering technique which does not require prior knowledge of the number of clusters and constrain the shape of the clusters. It is therefore suitable for unsupervised color segmentation.

The proposed multi-scale Mean-shift algorithm is summarized as follows:

Step 1. Choose a search window with proper bandwidth.

Step 2. Choose the initial location of the search window.

Step 3. Compute the mean value (centroid) of the data in the search window.

Step 4. Move the center of the search window to the mean location computed in Step 3.

Step 5. Repeat Step 3 and 4 until convergence.

Note that in step 1, the bandwidth of the kernel needs to be determined. In particular, the image is under-segmented when the bandwidth is too large, and over-segmented when the bandwidth is too small. In this work, proper bandwidth is determined dynamically, according to the frequency analysis results of the image [45]. For instance, a clustered image such as

crowds often indicates larger energy in high frequency, and requires a smaller bandwidth value.

On the contrary, a simple image such as a white wall, will give a larger bandwidth value.

Additionally, since the Hue values of pixels are unstable under low intensity and saturation, these colorless pixels should be segmented separately. In the current implementation, the hue value of a pixel is set to 2, if this pixel has a saturation value<0.1 or an intensity value <0.1.

The proposed segmentation algorithm takes 0.1 to 0.5 seconds to process an image of the size of 640 480 pixels. In order to further boost the speed for real-time applications, images are scaled down to one-tenth beforehand. Small objects may be neglected in this case.

Therefore, the proposed algorithm estimates the purity of each segment with original resolution, and segments again if needed. The modified method is on average 10 times faster than that using the original scale.

3.3.3 Obstacle Detection

After ground/off-ground features are determined, and the image is segmented into regions, it is now possible to classify the ground region in a pixel-wise manner. As mentioned in Section 1, many previous works warp the image by using the homography matrix and calculating the Sum of Absolute Differences (SAD) between the warped or rectified image and current image. However, it is difficult to determine a proper value of threshold since the SAD value is correlated to the environment. Furthermore, homogenous obstacles may be neglected.

The proposed system determines if the region is on the ground according to the displacement and the distortion of each segment separately. Similar techniques have been used in stereo vision when estimating disparity maps [46]. To do so, corresponding segments need to be found. While segments contained already matched feature points can be found easily, homogenous segments are matched using both feature points and its color distribution. The

proposed segment matching method considers finding the maximum of the overall likelihood function of multiple cues, that is,

 L a b( | )L_color( | )a b L_motion( | )a b  

The color likelihood model is defined in a way to estimate the similarity between the color histogram of two segments. Bhattacharyya coefficient has been widely used to determine the similarity of two discrete probability distributions [47]. Within the interval [0, 1], the larger the coefficient is, the more similar the two histograms are. Assuming two objects a and b with color histogram ha ={ha,1…ha,N}and hb ={hb,1…hb,N}, the color cue likelihood function between

Those segments which fail to be matched will be labeled as undetermined. Finally, each matched segment is then assigned an initial probability value 0.5. The probability value of each segment is updated on the basis of SAD. These segments are then classified as ground or off-ground. Both static and moving obstacles can be observed. Fig. 3-6 shows both the result of segmentation and ground plane labeling based on the images in Fig. 3-5. Note that in the image a piece of paper on the ground is classified as ground as expected.

In summary, The proposed ground plane detection algorithm is able to segment ground regions from two images. The feature classification step determines the features on the ground planes, while also finding the homography matrix H between the ground planes in two images.

The second image can thus be rectified with H. These two images are also segmented using color homogeny. These segments are matched with multiple cues, and finally determined as ground or off-ground. In practice, the key image for matching is not merely obtained from a single frame, but also a collection of tracked features and segments among N frames that can be processed in previous one. Therefore, this method can be applied even when the robot is rotating, or the camera is temporally occluded.

3.4 Calibration Experiment of Distance Measurement

In this experiment, distance measurement of the proposed MonoVNS was calibrated. The

(a)

(b)

Fig. 3-6: (a) Color segmentation result and (b) ground region labeling result based on the images in Fig. 3-5.

experiment was performed with a static object on the ground, while the camera was positioned to take snapshots and measure the distance to the object, as shown in Fig. 3-7. The snapshot was taken from a distance from 1.6m to 6m, at the interval of 0.4 m. The estimated distance was compared with the ground truth values. To examine the influence of scaling in segmentation step, we experimented with both the original images and the scaled images. The IPT homography was initially estimated by manually assigning known points on the ground within three meters to the camera. The acquired original image was 320×240 pixels, while the scaled image was 32×24 pixels. The average processing time using the original image is 251ms, while that of using the scaled image is 190ms. The processing time varies depending on the extracted features and the complexity of input imagery. Table I shows the calibration results. The measurement error is within 3% in a range of 4.8m, even if the image is scaled down to one-tenth. The reasons of the error are mainly due to the imperfect result of segmentation and the quantization error in IPT. These errors are especially significant when the object is far away and cannot be distinguished due to limited resolution of the imagery.

Note that the minimal and maximum sensing ranges are both related to the height and the tilt angle of the camera.

(a) (b)

Fig. 3-7: The obstacle used in the experiment. (a) the robot is 6m away from the obstacle (b) the robot is 1.6m away from the obstacle.

Table 3-1: Experimental Result for Distance Measurement by Using the proposed MonoVNS

3.5 Simulation results of Synthetic Images

In order to test the proposed system under different environments, a computer synthetic scene as shown in Fig. 3-8(a) was established by using Matlab to examine the classification rate with a known ground truth. The synthetic scene contains three walls, with the same texture. In each trial of the experiment, the path of the robot is randomly generated to observe 10 consecutive views. The texture of the walls also changed randomly in each trial. Fig. 3-8 (b) shows an 0.96%. The average false detection rate of features is 0.84%. The experiment also shows that

Ground

Original Scaled Original Scaled Original Scaled Original Scaled

1.6 45 ⁴ 1.63 ^1.6 0.03 ^0.0 1.94 ^1.7

the maximum false detection rate of region is 2.91%. The average false detection rate of region

在文檔中基於ZigBee智慧型環境與移動式機器人之位置感知系統設計 (頁 34-0)

CHAPTER 2. PROBABILITY-BASED LOCATION AWARE DESIGN

2.4 S UMMARY

Chapter 3.

Monocular Visual Navigation Design

3.1 Introduction

3.2 System Overview

p  q  ,

 H

p 

C’ I

H

H -1

q

3.3 Ground Plane and Obstacle Detection 3.3.1 Ground Feature Detection



C’ C

P

3.3.2 Ground Region Segmentation

3.3.3 Obstacle Detection

3.4 Calibration Experiment of Distance Measurement

3.5 Simulation results of Synthetic Images

C’ ^I

H ^-1