Organization of the Thesis - 利用色彩資訊及深度資訊之室內行動機器人對行人偵測與追蹤

Chapter 1 Introduction

1.4 Organization of the Thesis

This thesis includes six chapters. The remainder of this thesis is organized as follows. Chapter 2 states the literature of past research. This chapter includes two sections: simultaneous localization and mapping (SLAM), and pedestrian detection and tracking. Chapter 3 states robot localization and map construction. The tasks have the LRF usage, the robot localization, and the map construction in specific environment. Chapter 4 states the omnidirectional camera structure, the pedestrian detection by LRF scan spatial continuity, image color feature, and image edge feature, and target tracking by image color histogram, image LBP, and LRF scan. Chapter 5 shows the experimental result and analysis. In addition, the PSO algorithm compares to the ICP algorithm in SLAM. Both conclusions and feature works are presented in Chapter 6.

Chapter 2 Literature Survey

This chapter states the literature survey in the mobile robotic field. Section 2.1 states the simultaneous localization and mapping including self-localization and mapping. Self-localization and mapping are two of the fundamental capabilities for mobile robot [39: Birk & Carpin 2006]. In addition, pedestrian detection and target pedestrian tracking are researchable topics for the mobile robot. Section 2.2 states pedestrian detection and target pedestrian tracking. Figure 2.1 shows the SLAM categories in Section 2.1. And Figure 2.2 shows pedestrian detection and target pedestrian tracking categories in Section 2.2.

2.1 Simultaneous Localization and Mapping

SLAM is an important topic for a mobile robot in unknown indoor environment.

Although many sensors can be selected, the laser range finder (LRF) often is used to SLAM [15: Wu et al. 2013], [16: Rusdinar et al. 2010]. LRF is a sensor commonly used owing to its accuracy in distance measurement [15: Wu et al. 2013].

In terms of methods, iterative close point (ICP) algorithm is commonly used [17:

Zhang 1994], [18: Lu & Milios 1994]. However, local optimal solution is a problem

in scan matching. To overcome the problem relating to the local optimal solution, particle filter (PF) is proposed to correct the error pose [16: Rusdinar et al. 2010]. And extended Kalman filter (EKF) is used to decrease odometric error of the robot [20:

Kang et al. 2010]. To overcome the problem relating to the outliers, random sample

consensus (RANSAC) algorithm is used to filter outliers [19: Tong & Barfoot 2011].

To build the map, the occupancy grid map is used [21: Moravec & Elfes 1985], [39: Birk & Carpin 2006]. Since arbitrary data can be mapped, occupancy grid map is

focused [55: Winner et al. 2012]. The occupancy grid map needs a resolution to discretize the environment [55: Winner et al. 2012]. Therefore, the occupancy grid map can be chosen depending on the requirements of the precision of the data [55:

Winner et al. 2012]. The Bayesian probability grid map is used in [23: Thrun et al.

2005], [55: Winner et al. 2012]. The Bayesian probability grid map expresses the

possibility of grid is occupied and there is a lot of merits of calculation [55: Winner et al. 2012].

In the dynamic environment, the occupancy grid map is difficultly built for the full of people. The inverse observation model is used to build static occupancy grid map [22: Wolf & Sukhatme 2004].

In this thesis, the PSO algorithm is proposed to overcome the local minimum

solution with ICP in [24: Chang & Lian 2012]. Figure 2.1 demonstrates simultaneous localization and mapping categories.

Figure 2.1 Simultaneous Localization and Mapping Categories

2.2 Pedestrian Detection and Tracking

Pedestrian detection and target pedestrian tracking play an important role in the robotic field. Many sensors are implemented for pedestrian detection and target pedestrian tracking. The sensors include laser-based sensor and vision-based sensor.

The laser-based sensor is used for distance measurement application [55: Winner et al.

2012]. And the vision-based sensor is used for color image application [55: Winner et

al. 2012].

For distance information such as LRF scan, many approaches are presented in pedestrian detection and target pedestrian tracking. For pedestrian detection, inverse

observation model is used to differentiate between the dynamic objects and static objects [22: Wolf & Sukhatme 2004]. However, the method detects the moving object.

In [26: Sung & Chung 2011] and [9: Chung et al. 2012], the clustering legs into a pedestrian is presented. And using LRF scan in a two-layered arrangement to detect features is presented in [27: Carballo et al. 2010]. For target pedestrian tracking, K nearest neighbor (KNN) algorithm [3: Chen et al. 2011] and multiple hypothesis tracking (MHT) algorithm [24: Chang & Lian 2012] are used with LRF scan in target pedestrian tracking.

In the color image, background subtraction method acquires the moving objects in static scene [29: Stauffer & Grimson 1999], [28: Lin & Huang 2011]. In [30: Lee et al. 2003], background model updates based on Gaussian mixture model. In dynamic

scene, optical flow algorithm is applied [31: Enzweiler et al. 2008]. For pedestrian detection, the image feature includes corner [34: Xu & Xu 2013], edge geometry [33:

Zhao et al. 2008], texture [32: Leithy et al. 2010], [41: Kun et al. 2012], and color

distribution [33: Zhao et al. 2008]. The image features are regarded as condition judgments for pedestrian detection. What is more, Dalal and Triggs [35: Dalal &

Triggs 2005] present histograms of oriented gradients (HOG) feature vectors to detect

the pedestrian. Moreover, there are various approaches to track target pedestrian with color image. For handle the occlusion case, Lin and Huang [28: Lin & Huang 2011]

use either Kalman-filter or mean-shift algorithm in different conditions. Cox and Hingorani [25: Cox & Hingorani 1996] enumerate multiple models of targets from the latest three frames through multiple hypothesis tracking (MHT) algorithm.

For fusion of LRF scan and color image, a recognition method to track running pedestrians is presented [12: Ueda et al. 2011]. In [37: Kristou et al. 2011], the target pedestrian tracking uses the LRF scan. However, the pedestrian detection uses the color image.

In this thesis, the pedestrian extraction uses LRF scan spatial continuity, image color feature, and image edge feature. For tracking target pedestrian, color distribution is regard as a conditional judgment. The mobile robot robustly detects pedestrian and tracks target pedestrian. Figure 2.2 shows the pedestrian detection and target pedestrian tracking categories.

Figure 2.2 Pedestrian Detection and Target Pedestrian Tracking Categories

Chapter 3 Simultaneous Localization and Mapping

Self-localization and mapping are two of the fundamental capabilities for mobile robot [39: Birk & Carpin 2006]. The combination of the self-localization and the mapping is referred to as the simultaneous localization and mapping (SLAM) [39:

Birk & Carpin 2006]. In unknown indoor environment, SLAM is a researchable task

in this chapter. In this thesis, LRF is used owing to its accuracy in distance measurement [15: Wu et al. 2013]. The LRF operation principle and the LRF limitation are presented in Section 3.1. For self-localization, skidding and slipping can induce that the odometry is not equal to the real distance. To solve the problem, PSO algorithm [38: Li et al. 2011] is presented in Section 3.2. In the SLAM, once odemetry of mobile robot is known, building a map is also a task which can be effectively solved at the same time [39: Birk & Carpin 2006]. To build the map, the occupancy grid map is used in Section 3.3.

3.1 Laser Range Finder Usage and Limitation

LRF is a sensor measuring the distance. To use the LRF, the LRF operation principle and the LRF limitation should be known. In Section 3.1.1, the operation principle of LRF is presented. Section 3.1.2 states the limitation of LRF in specific scene.

3.1.1 Introduction of Laser Range Finder

For mobile robot, LRF is a common sensor. Compared with other sensors, LRF is a sensor commonly used owing to its accuracy in distance measurement. So the LRF is prevailing in the mobile robot.

The operation principle of LRF uses time of flight (ToF) to estimate the distance

from specific angle [4: Okubo et al. 2009]. Figure 3.1 shows the operation principle of LRF. First, the laser emits an infrared beam and rotating mirror changes the beam’s

direction [4: Okubo et al. 2009]. Then the laser hits the surface of an object and is reflected [4: Okubo et al. 2009]. ToF is proportional to distance measurement. Since the infrared beam is rapid, the scan rate can achieve at least ten scans per second.

From the direction of mirror, the phase of emitted can be estimated. Finally, the position of object is calculated.

Figure 3.1 The Operation Principle of Laser Range Finder

3.1.2 The Limitation of Usage in the Glass Environment

LRF is a sensor commonly used owing to its accuracy in distance measurement [15: Wu et al. 2013]. However, the limitation of LRF is about environment texture. In

the environment with glass, the light may refract the ray of light in the environment with glass like Figure 3.2(a). The incident light, laser beam, can be divided into diffusive reflection, specular reflection, and refraction, as shown in Figure 3.2(a). The diode only absorbs the diffusive reflection. Therefore, the missing data may occur.

The real-scene is Ming-Da Building 2F having two glass window, as shown in Figure 3.2(b). In this scene, the data may miss.

(a) (b)

Figure 3.2: The Texture Influence in Glass Environment (a) Theorem of the optical relation

(b) Scene in Ming-Da Building 2F

3.2 Robot Localization

Self-localization is an important task for mobile robot in unknown indoor environment. Although the mobile robot estimates the position with encoder, the scan matching technique can acquire precise position of the mobile robot. Generally, iterative closest point (ICP) algorithm [43: Besl & McKay 1992] is widely used to the scan matching. However, local optimal solution is a problem for ICP algorithm. The scan matching technique needs another algorithm.

PSO algorithm [44: Kennedy & Eberhart 1995] is a feasible method in scan matching technique. The PSO algorithm applies to minimize the distance energy function in the approximately global optimization problem [45: Eberhart & Shi 1998]. The particle swarm model sets the particles in the -dimensional problem space. The -th particle owns the self-position , self-velocity , and distance

Incident Light

Diffusive Reflection

Refraction

Specular Reflection

Glass

energy function in the search domain at time . Let each particle know the best its position and the best position in particles before time . Therefore, the position and the velocity of each particle with N particles are expressed as follows:

= ₁, ₂, … , _𝑁 (3.1)

= ₁, ₂, … , _𝑁 (3.2)

The -th particle is expressed as a point owning the position ¹ and velocity

1 at time t+1 according to the following equations:

1= 𝜔. + 𝑐₁. 𝑟𝑎𝑛𝑑 . − + 𝑐₂. 𝑟𝑎𝑛𝑑 . − (3.3)

1 = + . ∆ (3.4)

where ω is an inertia weight, 𝑐₁ is a cognitive coefficient, 𝑐₂ is a social coefficient,

and rand is a random probability in [0, 1]. ¹ updates through Equation (3.4).

However, to avoid overshooting the global solution, sets the threshold value _𝑚𝑎𝑥.

Figure 3.4 shows the illustration of particle motion and the process of particle motion.

In Figure 3.3(a), the plane represents the distance energy function. The curve represents the pass path for each particle by time t. The circle represents current position for each particle. The square presents the best position for each particle. And the triangle represents best position for all particles. Each particle owns its position and velocity. The velocity of each particle is determined by Equation (3.3) and the position of each particle is determined by Equation (3.4). Every particle can affect

each other. Figure 3.4(b) shows the flow chart of particle motion. For all particles, the

position and the velocity are set. The energy function is calculated. Then the pbes and the gbes are picked in the particles. And the particles move to next

position. Until the particles stay the same positions, the particles are global optimal positions. If the particles do not stay the same positions, the steps are iterative.

(a)

(b)

Figure 3.3 The Illustration and Process of Particle Selection (a) The illustration of each particle moving

i. The line represents the pass path;

ii. The circle represents current position.

iii. The triangle represents best position in all particles.

iv. And the square is the best position in its particle.

(b) The particle selections of the PSO algorithm

Figure 3.4 shows scan matching points of two data sets results with the LRF. The

scan matching points of two data sets include the red points and the blue points, as

shown in Figure 3.4. When the measurement error is not eliminated in time, it causes the inaccurate map. Figure 3.4(a) and (c) show the LRF scan based on odometry in Ming-Da 5F and Ming-Da 4F. Generally, PSO algorithm can reduce the measurement error to enhance the map accuracy, as shown in Figure 3.4 (b) and (d). The other experimental results in scan matching are shown in Section 5.3.

To acquire the more accurate robot position in populated environment, the pedestrian need to be detected. Section 3.3 states dynamic map concept. And Section 4.3 states pedestrian detection based on dynamic map.

(a) (b)

Figure 3.4 Scan Matching 2 Data Result (Target and Source are in the LRF scans) (unit: meter)

(a) Encoder in Ming-Da Building 5F (b) PSO algorithm process from (a) (c) Encoder in Ming-Da Building 4F (d) PSO algorithm process from (c)

3.3 Map Construction

In map construction, occupancy grid map is an important method. Section 3.3.1 uses the occupancy grid map to construct the map. The inverse observation model is used to build static occupancy grid map and dynamic occupancy grid map [22: Wolf &

Sukhatme 2004] in Section 3.3.2.

-8 -6 -4 -2 0 2 4 6

3.3.1 Grid Map Construction

To make mobile robot move arbitrarily, mapping is a necessary task. However, map construction is difficult for the mobile robot in dynamic environment. What is more, the inaccuracy measurement of LRF scan may cause the map false.

Occupancy grid map is a main method in map construction [23: Thrun et al.

2005]. The occupancy grid map needs a resolution to discretize the environment [55:

Winner et al. 2012]. Therefore, the occupancy grid map can be chosen depending on

the requirements of the precision of the data [55: Winner et al. 2012]. In occupancy grid map, three states include free, occupancy, and unknown. Bayesian probability update robustly applies to the occupancy grid map state as follows:

p 𝑆 |𝑍 , 𝑆⁻¹ = α. p 𝑆 |𝑍 . p 𝑆⁻¹|𝑍⁻¹, 𝑆⁻² (3.5)

Bel 𝑆 = α. p 𝑆 |𝑍⁻¹ . Bel 𝑆⁻¹ (3.6)

where the Bayesian probability p 𝑆 |𝑍 , 𝑆⁻¹ represents Bel 𝑆 . 𝑆 is map state in the specific position at time , and 𝑍 is the measurement in the specific position at time . α is a normalization coefficient.

By Equation (3.6), the occupancy grid map updates the state through iterative method. With this method, the grid occupancy probability only knows the previous occupancy grid map Bel 𝑆⁻¹ and the inverse observation probability p 𝑆 |𝑍⁻¹ .

3.3.2 Static Map and Dynamic Map

For the occupancy grid map, both the static map concept and the dynamic map concept are proposed [22: Wolf & Sukhatme 2004]. The dynamic map can be estimated from the following equation:

𝐷 |𝑍¹, … 𝑍 , 𝑆⁻¹

1 − 𝐷 |𝑍¹, … , 𝑍 , 𝑆⁻¹ = 𝐷 |𝑍 , 𝑆⁻¹

1 − 𝐷 |𝑍 , 𝑆⁻¹ .1 − 𝐷

𝐷 . 𝐷⁻¹

1 − 𝐷⁻¹ (3.7) where 𝑆 is the state at time t, 𝑍 is the measurement at time t, and 𝐷 is the

dynamic state at time t.

However, the p 𝐷 |𝑍 , 𝑆⁻¹ needs the inverse observation model to update.

Table 3.1 shows the occupancy probability p S in map construction with state.

Three states include free, occupancy, and unknown. The threshold is set to 0.2 and 0.8.

Table 3.1 The Occupancy Probability of Each State S

State Occupancy probability

Free p S ≤ 0.2

Unknown 0.2 < p S < 0.8

Occupied 0.8 ≤ p S

With Table 3.1 result, the inverse observation model establishes in Table 3.2 [22:

Wolf & Sukhatme 2004]. In [22: Wolf & Sukhatme 2004], Wolf and Sukhatme

propose static map and dynamic map. The static map includes many dynamic

obstacles owing to inverse observation model. Since inverse observation model predicts that the state from unknown to occupied is static object, the pedestrian detected with LRF scan by robot may be regarded as static object. The probability is low in dynamic objects. It means that the objects are static. p 𝐷 |𝑍 , 𝑆⁻¹ is estimated as follows:

Table 3.2 Inverse Observation Model

𝑆⁻¹ 𝑍 p 𝐷 |𝑍 , 𝑆⁻¹

Free Free Low

Unknown Free Low

Occupied Free Low

Free Occupied High

Unknown Occupied Low

Occupied Occupied Low

Figure 3.5 demonstrates an example of the inverse observation model analysis of

two consecutive LRF scan. The state of pillar and the state of pedestrian are unknown for the robot at time t-1, while the pillar and the pedestrian are detected for robot at time t. Therefore, the pillar and the pedestrian are regarded as static objects from Table 3.2. In fact, the pillar should be regarded as a static object and the pedestrian

should be regarded as a dynamic object. Table 3.2 is not obviously sufficient. In this scene, the moving object is only pedestrian. The pedestrian detection in Section 4.4 can solve the problem of inverse observation model in Table 3.2.

(a)

(b)

Figure 3.5 The Inverse Observation Model Problem (with LRF Scan Grid Map) (a) The pillar and the pedestrian are unknown for the robot at time t-1.

(b) Either the pillar or the pedestrian is regarded as the static in inverse observation model in LRF scan at time t. The pedestrian detection judgment describes in Section 4.3.

Chapter 4 Pedestrian Detection and Target Pedestrian Tracking

For pedestrian detection and target pedestrian tracking, both LRF scan and color image are used in this chapter. Section 4.1 states the operation principle of omnidirectional camera and states the problem of the equipment. However, the combination of LRF and omnidirectional camera is difficult since the sensors are not calibrated. The calibration between LRF and omnidirectional camera can be divided into horizontal adjustment, translation, and rotation. In the calibration, the rotation calibration of the combination of LRF and omnidirectional camera is a researchable question. Section 4.2 states rotation calibration of the combination of LRF and omnidirectional camera. For pedestrian detection, the non-pedestrian needs to be filtered out. In this thesis, the methods with the LRF scan and the color image are presented in Section 4.3. LRF scan roughly judges pedestrian. Then the Hough circle transform and the color distribution are the judgment with the color image. With the above methods, the pedestrian detection can be implemented. In target pedestrian tracking, owing to pillars hindering or new pedestrians appearing, the data association

may be error between two consecutive LRF scans. The color distribution and the local binary pattern (LBP) algorithm are used in the problems of Section 4.4.

4.1 The Operation Principle of Omnidirectional Camera

In the pedestrian detection problem and the target pedestrian tracking problem of Section 1.2, the LRF scan and the color image should be used. Owing to wide field of

view (FOV), the omnidirectional camera is necessarily used in the problems. Section 4.1.1 states the operation principle of omnidirectional camera. What is more, Section

4.1.2 presents the histogram equalization for low-light omnidirectional camera image.

4.1.1 Introduction of Omnidirectional Camera

The color image is widely used in the mobile robotic field. The color feature plays an important role in pedestrian detection and target pedestrian tracking.

Therefore, the camera is often mounted on the mobile robot.

In this thesis, one of the problems is pillars hindering. The data association may be error between two consecutive LRF scans. To search the target pedestrian, the omnidirectional camera is necessarily used. The omnidirectional camera’s field of view is 360 degree. Therefore, the omnidirectional camera is regarded as an available

tool to track the target pedestrian in the pillars hindering problem.

The structure of omnidirectional camera includes a hyperbolic mirror and a camera under the mirror like Figure 4.1 [5: Yagi et al. 2005]. The horizontal passing through the virtual center line (HPVCL) maintains the same height in projection [53:

Yang & Lian 2012], [5: Yagi et al. 2005]. The operation principle makes a light flight

to the upper center 0, c . When the light touches the hyperbolic mirror, the light reflects to the other center 0, −c . The image appears in the process.

Figure 4.1 The Omnidirectional Camera Structure

4.1.2 The Lightness of Omnidirectional Camera

Although the omnidirectional camera owns many advantages, it still overcomes a low-light problem in Figure 4.2. Since the light does not directly flight to image plane,

Camera

the color image is dark. The low-light causes the image dark, as shown in Figure 4.3(a). However, the real scene from digital camera is bright, as shown in Figure

4.3(b). That the lighting sources in the beginning put on the floor seems unworkable

because of the unknown environment. The caution leads to two influences. One is the edge threshold value sets small. As a result, the noise easily interfaces the results. The other is each color channel distribution is dense. Therefore, using the color space to tracking target pedestrian is more difficult. Two methods are present to improve the influences. One uses the histogram equalization [6: Gonzalez & Woods 2008] stated in the next paragraph. The other enhances light through the aperture. In this thesis, the histogram equalization is used. With the process, the color image results obtain more robust.

Figure 4.2 The Light Depends on the Active Lighting Source

(a) (b)

Figure 4.3 Display Image with Lightness Problem in Ming-Da Building 2F (a) The omnidirectional image has the light problem

(b) The real-scene with digital camera

Histogram Equalization is a method making the intensity in image uniform [6:

Gonzalez & Woods 2008]. The variables are shown in Figure 4.4. Let 𝑃_𝑟 𝑟 be the probability of the intensity. Assume the output intensity s, and the definition of s is

the following:

s = T r = ∫ 𝑃^𝑟 _𝑟 𝑤 𝑑𝑤

(4.1)

In this transform, the probability of s is the cumulative distribution function (CDF) of the input 𝑟. And that is proved in [6: Gonzalez & Woods 2008]. The

definition of 𝑃_𝑠 is the following:

𝑃_𝑠 = { 1 0 ≤ ≤ 1

0 𝑜 ℎ 𝑟𝑤𝑖 (4.2)

where the probability of 𝑃_𝑠 is a uniform function. Owing to digital signal, the intensity _𝑘 of image from Equation (4.1) is a discontinuous function in the image

在文檔中利用色彩資訊及深度資訊之室內行動機器人對行人偵測與追蹤 (頁 26-0)