Robust visual tracking control system of a mobile robot based on a dual-Jacobian visual interaction model

(1)

Contents lists available atScienceDirect

Robotics and Autonomous Systems

journal homepage:www.elsevier.com/locate/robot

Robust visual tracking control system of a mobile robot based on a dual-Jacobian

visual interaction model

Chi-Yi Tsai

a

,

Kai-Tai Song

a,∗

,

Xavier Dutoit

b

,

Hendrik Van Brussel

b

,

Marnix Nuttin

b a_{Department of Electrical and Control Engineering, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan, ROC}

b_{Department of Mechanical Engineering, Division PMA, K. U. Leuven, Celestijnenlaan 300B, B-3001 Leuven, Belgium}

a r t i c l e i n f o Article history:

Received 26 March 2007 Received in revised form 6 January 2009 Accepted 7 January 2009 Available online 21 January 2009 Keywords:

Visual tracking control Visual estimation Visual interaction model Kalman filter

a b s t r a c t

This paper presents a novel design of a robust visual tracking control system, which consists of a visual tracking controller and a visual state estimator. This system facilitates human–robot interaction of a unicycle-modeled mobile robot equipped with a tilt camera. Based on a novel dual-Jacobian visual interaction model, a robust visual tracking controller is proposed to track a dynamic moving target. The proposed controller not only possesses some degree of robustness against the system model uncertainties, but also tracks the target without its 3D velocity information. The visual state estimator aims to estimate the optimal system state and target image velocity, which is used by the visual tracking controller. To achieve this, a self-tuning Kalman filter is proposed to estimate interesting parameters and to overcome the temporary occlusion problem. Furthermore, because the proposed method is fully working in the image space, the computational complexity and the sensor/camera modeling errors can be reduced. Experimental results validate the effectiveness of the proposed method, in terms of tracking performance, system convergence, and robustness.

1. Introduction

An intelligent robot uses its on-board sensors to collect information from the surroundings and react to the changes of its immediate environment. In recent years, vision systems have been widely used for various intelligent robots, and the research on visual tracking control has gained increasing attention in the area of robotic research [1–25]. In robotics, visual tracking control means vision-based robot motion control to track an interesting target. Based on the motion constraints of the robot, visual tracking control can be classified into two categories: visual servoing for holonomic manipulators and visual tracking for nonholonomic mobile robots. Although visual servoing of holonomic manipulators has been discussed extensively and many results can be found in the literature [1–3], mobile robots are commonly nonholonomic and the visual servoing results for holonomic manipulators are unsuitable for a mobile platform [4].

This paper addresses the problem of visual tracking control of unicycle-modeled or usually termed as wheeled mobile robots equipped with an on-board monocular vision system. Due to large number of mobile robot visual tracking control methods, we

∗_{Corresponding author. Tel.: +886 3 5731865; fax: +886 3 5715998.}

E-mail addresses:[email protected](C.-Y. Tsai),

[email protected](K.-T. Song),[email protected]

(X. Dutoit),[email protected](M. Nuttin).

classify the reported methods into four groups based on the type of the target to be tracked. Many efforts focus on the first group which aims to track a static target, such as a ground line, landmark, or reference image, for the purpose of mobile robot navigation or regulation (so-called homing) [5–17]. For a mobile robot to track a ground line, Ma et al. formulated the visual tracking control problem as controlling the shape of a ground curve in the image plane and proposed a closed-loop vision-guided control system for a nonholonomic mobile robot [5]. Coulaud et al. proposed a simple and stable feedback controller design, which avoids sophisticated image processing and control algorithms, for a mobile robot equipped with a fixed camera to track a line on the ground [6]. In the case of tracking the landmark, the reported controllers usually modify the visual servoing technique to satisfy the nonholonomic constraint for the motion control of the mobile robot [7–10]. In [11], Zhang and Ostrowski utilized an optimal control method to solve the visual motion-planning problem by generating a virtual trajectory in the image plane and the corresponding optimal control signals for the robot to follow. Nierobisch et al. proposed a visual tracking control method for a mobile robot with a pan-tilt camera to track visual reference landmarks in the acquired views during autonomous navigation [12]. Recently, the homography-based [13,14] and epipolar-based [4,15–17] visual tracking control approaches were proposed for a mobile robot equipped with a pinhole or an omni-directional (so-called central catadioptric) camera to track a reference image toward a desired configuration. These two approaches consider the mobile robot visual tracking

(2)

control problem as a visual servoing regulation or visual homing problem. In [13], Chen et al. developed a visual tracking controller based on the Euclidean homography to track a desired time-varying trajectory defined by a prerecorded image sequence of a stationary target viewed by the on-board camera as the mobile robot moves. However, the stability of their result is restricted by the non-zero reference velocity condition of the desired trajectory. To overcome this drawback, Fang et al. exploited Lyapunov-based techniques to construct a homography-Lyapunov-based visual servoing regulation controller for proving asymptotic regulation of the mobile robot [14]. In [15], Mariottini et al. exploited the epipolar geometry defined by the current and desired camera views to develop a two-step visual servoing regulation controller. They also extend this design to the visual servoing regulation control of a mobile robot with a central catadioptric camera [16]. In [17], Goedemé et al. developed a vision-only navigation and homing system for mobile robots with an omni-directional camera. Their method divides the visual homing operation into two phases and computes visual homing vector based on epipolar geometry estimation. Although these approaches of the first group provide appropriate solutions for static target visual tracking control problem, they cannot guarantee to solve the moving (non-static) target visual tracking control problem.

The second group aims to track other robot teammates in a robot team for the formation control purpose [18,19]. The proposed approaches in this group usually are based on a central catadioptric camera model in order to detect all robot teammates at the same instant. The subject of the third group is to track a predictable moving target, such as a projectile or straight moving ball, for mobile robot interception purpose [20,21]. In [20], Borgstadt et al. utilized a human vision-based strategy to guide a mobile robot to intercept a projectile ball. Similarly, Capparella et al. extended the concept of human-like strategy to develop a vision-based two-level interception approach, which contains a lower level controller to control the on-board pan-tilt camera and a higher lever controller to operate the mobile robot platform, for intercepting a straight moving ball [21]. A common point of the second and third group is that the motion of the interesting target is known and predictable. However, in many robotic applications, a mobile robot requires to track a dynamic and unpredictable motion target, such as a human’s face, for the purpose of pursuit or interaction. Thus, the existent methods of the aforementioned two groups are not suitable to solve the dynamic moving target visual tracking control problem.

The purpose of the fourth group is to solve the problem of tracking a dynamic moving target [22–25]. In [22], Wang et al. proposed an adaptive backstepping control law based on an image-based camera-target visual interaction model to track a dynamic moving target with unknown height parameter. Although the approach in [22] guarantees the asymptotic stability of a closed-loop visual tracking control system in tracking a dynamic moving target, the case of tracking a static target cannot be guaranteed due to the non-zero restrictions on the reference velocity of the mobile robot. In [23], Malis et al. integrated template-based visual tracking algorithms and model-free vision-based control techniques to build a flexible and robust visual tracking control system for various robotic applications. Because their visual tracking result is based on the homography estimation, which requires two images of the target pattern to estimate the optimal homography, the reported system only overcomes the partial occlusion problem but fails in the fully occlusion problem. In [24], Han et al. proposed an image-based visual tracking control scheme for a mobile robot to estimate the position of the target in the next image and track the target to the central area of the image. Since their method utilized the differential approximation method to estimate the velocity of the target in the image plane, the estimation result is very

sensitive to image noise. Recently, a visual interaction controller had been proposed for a unicycle-modeled mobile robot to track a dynamic moving target such as a human’s face [25]. The drawback of this method is that the controller requires the target’s 3D motion velocity, which is difficult to estimate when only a monocular camera is used.

From the literature survey, we note that challenges in the mobile robot visual tracking control design is to develop a robust tracking control system to estimate the motion of the moving target and to track the target based on a stability criterion. This problem motivates us to derive a new model for developing a robust visual tracking control system to solve the tracking problem of dynamic moving target in the image plane directly and efficiently. To do so, the visual interaction model described in [25] is extended to derive a novel dual-Jacobian visual interaction model for designing a robust mobile robot visual tracking control system, which encompasses a visual state estimator and a visual tracking controller. The visual state estimator is constructed by a real-time self-tuning Kalman filter and aims to estimate the optimal system state and target motion in the image plane directly for later use by the visual tracking controller. The visual tracking controller then calculates the robot’s control velocities in the image plane directly. The main differences between the proposed method and other existent approaches are summarized as follows: (1) The proposed dual-Jacobian visual interaction model considers

not only the effect of mobile robot motion, but also the effect of target motion. Thus, based on the proposed model, the visual tracking control problem of a unicycle-modeled mobile robot for tracking a dynamic moving target can be solved with asymptotic convergence using a single controller. Moreover, the proposed model also considers the kinematics of a tilt camera platform mounted on the mobile robot. Therefore, the applicability of the proposed method is greatly increased. (2) The proposed visual tracking control system not only possesses

some degree of robustness against the system model uncer-tainties, but also overcomes the unmodeled quantization ef-fect in the velocity commands and the occlusion efef-fect during visual tracking process. This advantage enhances the reliability of the proposed method in practical applications.

(3) The proposed visual tracking control system works fully in the image space. Therefore, compared with position-based [23], homography-based [13,14], and epipole-based [4, 15–17] visual tracking control approaches, the computational complexity and the sensor/camera modeling errors can be much reduced due to the advantages of image-based visual servo control [2].

(4) The proposed self-tuning Kalman filter can automatically choose a suitable observation covariance matrix in varying en-vironmental conditions. This helps to improve the estimation performance when there is noise in the system observation. To validate the performance and robustness of the proposed control system, computer simulation and experimental studies of tracking a moving target have been conducted. Simulation and experimental results will be presented and discussed to verify the effectiveness of the proposed control system, in terms of tracking performance, system convergence, and robustness. Note that a brief version of the research results has been published in [26]. This paper will present the complete design of the proposed tracking control system, including robustness analysis, computer simulation and experimental validation.

The rest of this paper is organized as follows. Section 2 describes the proposed dual-Jacobian visual interaction model. Section3presents the results of visual tracking controller design. Section 4 develops the visual state estimator using Kalman filter with self-tuning algorithm. Simulation and experimental results are reported in Section5. Extended discussion of several interesting observations will be presented. Section6concludes the contributions of this paper.

(3)

Fig. 1. (a) A model of the unicycle-modeled mobile robot and target in the world coordinate frame. (b) Side view of the mobile robot with a tilt camera mounted on top of

its head to track a dynamic target.

2. Dual-Jacobian visual interaction model

As shown inFig. 1, the considered system is a unicycle-modeled mobile robot equipped with a tilt camera mounted on top of it to track a moving target, such as a human face, in the image plane. The robot can keep the target in the camera’s field of view such that the optical axis of this camera faces the interested target.Fig. 1(a) illustrates the model of the unicycle-modeled mobile robot and target in the world coordinate frame Ff, in which the motion of the

target is supposed to be holonomic.Fig. 1(b) is the side view of the scenario under consideration, in which the tilt angle

φ

gives the relationship between camera coordinate frame Fcand the mobile

coordinate frame Fm. The kinematics of the unicycle-modeled

mobile robot and the target can be described [27], respectively, by







˙

z_fm

˙

xm_f

˙

θ

m f

˙

φ

m f







=







cos

θ

_fm 0 0 sin

θ

_fm 0 0 0 1 0 0 0 1











v

m f

w

m f

w

m t





,

˙

ym_f

=

0 and





˙

z_ft

˙

xt_f

˙

yt_f



 =





v

z f

v

x f

v

y f





,

(1)

where

(

xm_f

,

ym_f

,

zm_f

)

and

(

xt_f

,

yt_f

,

z_ft

)

are, respectively, the positions of the mobile robot and the target in Cartesian coordinates.

(θ

m

f

, φ)

are the orientation angle of the mobile robot and the tilt angle of the onboard camera.

(v

m_f

, w

_fm

)

and

w

_tmare, respectively, the linear and angular velocity of the mobile robot and the tilt velocity of the camera.

(v

x

f

, v

y

f

, v

fz

)

are the target velocity in Cartesian coordinates.

In order for the mobile robot to interact with the target in the image coordinate frame, a visual interaction model was proposed in authors’ previous work [25].Fig. 2shows the definition of the observed system state in the image plane used for the visual interaction model. InFig. 2, xiand yiare, respectively, the

horizontal and vertical position of the centroid of target in the image plane, and dxis the width of target in the image plane. Let

Xi

= [

xi yi dx

]

Tdenote the system state in the image plane,

u

= [

v

_fm

w

_fm

w

m_t

]

T_{is the control velocity of the mobile robot} and on-board tilt camera,

(

fx

,

fy

)

represent fixed focal length along

the image x-axis and y-axis, respectively [28], and W stands for the actual width of the target. The visual interaction model between robot and target in the image coordinate frame can be derived as inBox I, where diag() denotes a 3-by-3 diagonal matrix, kx

=

dx

/

W

and ky

=

kxfy

/

fxare two scalars, and

δ

y is the distance between the

Fig. 2. Definition of the observed system state in the image plane.

center of robot tilt platform and the onboard camera. A detailed derivation of the visual interaction model(2) given inBox I is presented inAppendix.

The visual interaction model(2)given inBox Iindicates that the elements of system matrix Aiand vector Ciare functions of the

target velocity. Thus, expression(2)given inBox Ican be rewritten as in Box II, and Vt

=

[

v

xf

v

y f

v

z

f

]

T is the vector of target

velocities in Cartesian coordinates. Expression(3)given inBox II shows that the visual interaction model consists of two parts: first, the effect of target motionX

˙

_it

≡

˙

xt_i y

˙

t_i d

˙

t_x

T

=

J_tVt, and second,

the effect of mobile robot motionX

˙

m

i

≡

˙

xmi

˙

ymi d

˙

mx

T

=

Biu.

Thus, the Eq.(3)given inBox IIcan be rewritten as a dual-Jacobian equation such that

˙

Xi

= ˙

Xit

+ ˙

X m

i

=

JtVt

+

Biu

,

(4)

where matrix J_t, termed as target image Jacobian, transfers the target velocity Vt into target image velocityX

˙

it; matrix Bi, which

denotes robot image Jacobian, transfers the mobile robot control velocity u into robot image velocityX

˙

m

i . In other words, the image

velocityX

˙

i is caused by the combination of target image velocity

˙

Xt

i and robot image velocityX

˙

im.Fig. 3shows the concept of

dual-Jacobian equation(4). Therefore, the visual interaction between robot and target in the image coordinate frame can be modeled as a dual-Jacobian visual interaction model(4).

(4)

˙

Xi

=

AiXi

+

Biu

+

Ci (2)

where Ai

=

diag

(

A1,A2,A1), A1

= −

fx−1kx

(v

fxcos

φ

sin

θ

m f

+

v

y fsin

φ + v

z fcos

φ

cos

θ

m f

)

A2

= −

fy−1ky

(v

xfcos

φ

sin

θ

m f

+

v

y fsin

φ + v

z fcos

φ

cos

θ

m f

)

Bi

=





f_x−1kxxicos

φ

f −1 x

(

x2i

+

fx2

)

cos

φ −

f −1 y fx

(

ky

δ

y

+

yi

)

sin

φ

−

f −1 y xi

(

ky

δ

y

+

yi

)

ky

(

sin

φ +

fy−1yicos

φ)

fx−1fyxi

(

sin

φ +

fy−1yicos

φ)

−

fy−1

(

y

2 i

+

f 2 y

+

kyyi

δ

y

)

f_x−1kxdxcos

φ

fx−1xidxcos

φ

−

fy−1dx

(

ky

δ

y

+

yi

)





Ci

=





kx

(v

zfsin

θ

fm

−

v

xfcos

θ

fm

)

ky

(v

fycos

φ − v

x fsin

φ

sin

θ

m f

−

v

z fsin

φ

cos

θ

m f

)

0





Box I.

˙

Xi

=

(

AiXi

+

Ci

) +

Biu

=

JtVt

+

Biu

,

(3) where J_t

=





−

kx

(

fx−1xicos

φ

sin

θ

fm

+

cos

θ

m

f

)

−

kxfx−1xisin

φ

−

kx

(

fx−1xicos

φ

cos

θ

fm

−

sin

θ

m f

)

−

ky

(

fy−1yicos

φ

sin

θ

fm

+

sin

φ

sin

θ

m

f

) −

ky

(

fy−1yisin

φ −

cos

φ) −

ky

(

fy−1yicos

φ

cos

θ

fm

+

sin

φ

cos

θ

m f

)

−

k_xf_x−1d_xcos

φ

sin

θ

_fm

−

k_xf_x−1d_xsin

φ

−

k_xf_x−1d_xcos

φ

cos

θ

_fm





Box II.

Fig. 3. Depicts the concept of dual-Jacobian visual interaction model(4).

Similar to the human’s visual tracking behavior, the purpose of visual tracking control design aims to control the centroid position and width of target from an initial state to the desired state in the image plane. To achieve this, an error state model is introduced to design the tracking controller. Define the error coordinates in the image plane such that

Xe

=

xe ye de

T

= ¯

Xi

−

Xi∗

=

¯

xi

−

x∗i y

¯

i

−

y∗i d

¯

x

−

d∗x

T

,

(5) whereX

¯

i

=

¯

xi y

¯

i d

¯

x

T

is the desired state in the image plane;

X_i∗

=

x∗_i y∗_i d∗_x

T

is the estimated system state from a visual state estimator (VSE, see Section4). If the system state converges to the desired state, the visual tracking control problem is solved. Furthermore, the dynamic error state model in the image plane can be derived directly by taking the derivative of(5). The result is given by

˙

Xe

= − ˙

Xit

− ˙

X m

i

= −

JtVt

−

Biu

.

(6)

With the new coordinate Xe, the visual tracking control problem is

transformed into a stability problem. In the rest of this paper, the new coordinate Xewill be used to solve the visual tracking control

problem. If Xeconverges to zero, then the visual tracking control

problem is solved.

3. Visual tracking controller (VTC) design

In this section, a visual tracking control law is derived based on the error state model(6)to interact with an interesting target in the image plane by exploiting feedback linearization and pole placement techniques. The robustness analysis of the proposed controller against parametric uncertainties is also presented.

3.1. Feedback linearization and pole placement

According to the dynamic error state model (6), a feedback control law can be obtained by feedback linearization such that

u

=

B−_i1

(

K_gXe

−

JtVt

) =

B −1

i

(

KgXe

− ˙

Xit

),

(7)

where Kgis an 3-by-3 gain matrix. Substituting(7)into(6)yields

˙

Xe

= −

KgXe

.

(8)

It is clear that if all eigen values of matrix

−

Kgare constant and lie

inside the left-half complex plane, then the system error state Xe

(

t

)

will decay exponentially to zero. Thus, we choose the gain matrix by pole placement such that

K_g

=

diag

(α1, α2, α3),

(9)

in which

(α1, α2, α3)

are three positive constants. Substituting(9) into(8)yields

˙

Xe

= −

KgXe

= −

diag

(α1, α2, α3)

Xe

.

(10)

Suppose that the initial error state Xe

(

t0)is within the image plane.

Then expression(10)indicates that

Xe

(

t

)|

t0,Xe(t0)

≡

Xe

(

t

;

t0,Xe

(

t0))

=

diag

(

e−α1(t−t0)

,

_e−α2(t−t0)

,

_e−α3(t−t0)

)

_X

e

(

t0)

for some t0

≥

0

.

(11)

Because

(α1, α2, α3)

are three positive constants, expression(11) leads to the following inequality:

k

Xe

(

t

;

t0,Xe

(

t0))k ≤e−λmin(Kg)(t−t0)

k

Xe

(

t0)k for all t

≥

t0, (12)

where

λmin(

A

)

denotes the minimum eigenvalue of matrix A, and

k

B

k

denotes the 2-norm value of vector B. From(12), it is clear that the system error state satisfies

k

Xe

(

t

;

t0,Xe

(

t0))k ≤ kXe

(

t0)k

(5)

and limt→∞

k

Xe

(

t

;

t0,Xe

(

t0))k = 0, and thus the visual tracking

control problem is solved. Summarizing the above discussions, we obtain the following proposition.

Proposition 1. Suppose that the initial system state Xi is within

the image plane. Let

(α1, α2, α3) >

0 be three positive constants.

Consider the dual-Jacobian visual interaction system(4). If the matrix

Bi is nonsingular, then the closed-loop visual tracking system(6)is

asymptotically stable by using the control law

u

=

B−_i 1

(

K_gXe

− ˙

Xit

),

(13)

whereX

˙

t

i

=

JtVtis the target image velocity defined in(4), matrices

Biand Jtare the robot and target image Jacobian defined in(2)and

(3), respectively, Xeis the system error state defined in(5), and Kgis

a 3-by-3 diagonal gain matrix defined in(9).

Proof. Consider the closed-loop visual tracking system(6). We first define a positive-definite Lyapunov function associated with the system error state

V

(

xe

,

ye

,

de

) =

1 2

(

x 2 e

+

y 2 e

+

d 2 e). (14)

Taking the derivative of(14)yields

˙

V

=

X_eTX

˙

e

= −

(

XeTJtVt

+

XeTBiu

) = −

XeT

( ˙

X t i

+

Biu

) ≡ −

f

(

u

),

(15) where f

(

u

) =

XT e

( ˙

X t

i

+

Biu

)

. In view of Lyapunov theory [29],

expression(15)tells us that if f

(

u

) >

0 then the equilibrium point of(6)is asymptotically stable. Substituting the control law(13)into

f

(

u

)

, we then have

f

(

u

) =

X_eTKgXe

,

(16)

where Kg

=

diag

(α1, α2, α3)

, and

(α1, α2, α3)

are three positive

constants. Since Kgis a symmetric positive definite (SPD) matrix,

the following inequality holds:

f

(

u

) ≥ λmin(

Kg

) k

Xe

k

2

=

min

(α1, α2, α3) k

Xe

k

2

>

0

,

(17)

where

λmin(

A

)

denotes the minimum eigenvalue of matrix A. Expression (17) concludes that the closed-loop visual tracking system(6)is asymptotically stable and hence completes the proof.

3.2. Singularity analysis

The feedback linearization control law(13)poses a singularity problem of matrix Bi. In [25], the singularity condition of matrix Bi

is derived such that

fy

=

(

yi

+

Sdx

)

tan

φ,

(18)

where S

=

(

fy

δ

y

)/(

fxW

)

is a fixed scalar factor. Let

(

xc

,

yc

,

zc

)

represents the related position between robot and target in the camera coordinate frame. Because of fy

=

kyzc, yi

=

kyyc, and

dx

=

kxW (see(A.2)inAppendix), singularity condition(18)can

be rewritten such that

zc

=

(

yc

+

δ

y

)

tan

φ,

(19)

where zcis the distance from the camera to the target.Fig. 4shows

the particular configuration of the singular condition(19). InFig. 4, the distance z_φequals:

z_φ

=

(

yc

+

δ

y

)

tan

φ.

(20)

From(19)and(20), it is clear that the physical meaning of the singularity condition(19)is that the distance between the camera and the target equals to the distance z_φ. In general, this situation usually will not happen unless the target directly locates on Ym-axis

(the Y -axis of the mobile coordinate frame Fm), which means that

the target is directly above or directly below the robot. Under such circumstances, the robot will be unable to approach the target in any way due to deficient degree-of-freedom, and the robot should stop tracking temporarily.

Fig. 4. Particular configuration of the singularity condition(10).

Remark 1. The proposed control law(13)is associated with the target’s velocity in the working space or target’s image velocity in the image plane. If the 3D velocity of the target Vt is known,

the target image velocityX

˙

t

i can be calculated by using target

image Jacobian J_t without the estimation process. However, in practical applications, it is difficult to measure the 3D velocity of the target when using only one camera in real-time operations. In this situation, an estimation process is required to estimate the target image velocityX

˙

t

i in the image plane directly. In Section4,

a VSE will be proposed to accomplish this task. This design will facilitate more general performance of the proposed tracking control scheme in the image plane.

3.3. Robustness analysis

In this subsection, we investigate the robustness of the pro-posed VTC(13) against model uncertainties on camera param-eters

(

fx

,

fy

)

, robot parameters

(θ

fm

, φ)

, and target parameters

(

W

, ˙

xt_i

, ˙

yt_i

, ˙

dt_x

)

. Consider the following closed-loop visual tracking system with parametric uncertainties:

˙

Xe

= −

X

˙¯

t

i

− ¯

Biu

= −

( ˙

Xit

+

δ ˙

Xit

) − (

Bi

+

δ

Bi

)

u

,

(21)

where

δ ˙

Xt

i and

δ

Biare unknown bounded disturbances. Recall the

positive-definite Lyapunov function defined in(14), the derivative of(14)with parametric uncertainties becomes

˙¯

V

= −

X_eT

(

X

˙¯

t_i

+ ¯

Biu

) = −[

f

(

u

) + δ

f

(

u

)] ≡ −¯

f

(

u

),

(22)

where

δ

f

(

u

) =

XT

e

(δ ˙

Xit

+

δ

Biu

)

is unknown. Assume that

δ

f

(

u

)

is

bounded and there exists a SPD matrixQ such that

˜

δ

f

(

u

) < k

Xe

k

2_Q˜

,

(23)

where

k

X

k

_A

=

(

XT_AX

₎

1/2_{denotes a weighted vector norm with a} SPD matrix A. Now, the main result is presented as follows. Theorem 1. Consider the dual-Jacobian visual interaction system(4)

with unknown bounded parametric uncertainties

δ ˙

Xt

i and

δ

Bidefined

in(21). Let Q

˜

>

0 be a SPD matrix defined in(23). Choose the controller u as given in the expression(13)with a constant SPD matrix

K_g

=

diag

(α1, α2, α3) >

0. Then, the closed-loop visual tracking

system(6)is asymptotically stable for all

α

i

> λmax( ˜

Q

)

, i

=

1

,

2

,

3.

Proof. From(22)and(23), it follows that

f

(

u

) = ¯

f

(

u

) − δ

f

(

u

) > ¯

f

(

u

) − k

Xe

k

2_Q˜

.

(24) Expression(24)implies that

¯

f

(

u

)−k

Xe

k

2_Q˜is a lower-bound of f

(

u

)

. Iff

¯

(

u

) − k

Xe

k

_Q2˜

>

0 can be guaranteed, then f

(

u

) >

0 is satisfied and thus the system has the robust property w.r.t. the parametric uncertainties.

Choose the controller u as in(13)with parametric uncertainties such that

u

= ¯

B−_i 1

(

KgXe

−

X

˙¯

t

(6)

whereX

˙¯

t_i

= ¯

J_tVt. Substituting(25)into(22)yields

˙¯

V

= −¯

f

(

u

) = − k

Xe

k

2Kg

,

(26)

where Kg

=

diag

(α1, α2, α3) >

0 is a constant SPD matrix. From

(24)and(26), it is clear that

f

(

u

) > k

Xe

k

2Kg

− k

Xe

k

2

˜

Q

≥ [

λmin(

Kg

) − λmax( ˜

Q

)] k

Xe

k

2

_,

₍₂₇₎

where

λmin(

A

)

is defined in (17), and

λmax(

A

)

denotes the maximum eigenvalue of matrix A. Expression(27)tells us that if

λmin(

Kg

) − λmax

( ˜

Q

)

is positive, then f

(

u

) >

0 is satisfied and

thus the equilibrium point of (6) is asymptotically stable. This means that the proposed VTC(13)is robust against the unknown parametric uncertainties and hence completes the proof. Remark 2. In realization of the control schemes, it is worth noting that the quantization error in velocity commands degrade the performance of the controller and might make the system unstable. In order to overcome this problem, the proposed VTC (13)will be combined with a robust control law presented in [25] for improving the robustness of the visual tracking control system. Interested readers refer to [25] for more technical details. 4. Visual state estimator (VSE) design

As implied by Proposition 1, the VTC (13) requires the information of target image velocityX

˙

_it. This requirement poses two questions: first, how the target image velocity can be estimated in the image plane directly; second, what estimation methods can be used. In this section, we first formulate the problem of target image velocity estimation in the image space. A VSE will then be proposed in order to compute the optimal estimates of target status Xi and target image velocityX

˙

it in the

weighted least squared error sense for later used by the VTC.

4.1. Problem formulation

Since actual image processing is discrete, the first step of VSE design is to discretize the system model(4)into a corresponding discrete form. By the definition

˙

x

(

t

) =

limT→0

[

x

(

t

) −

x

(

t

−

T

)]/

T , T denotes the sampling time of the digital system, one can

approximate the system model(4)as

Xi

[

n

] =

Xi

[

n

−

1

] +

TX

˙

it

[

n

−

1

] +

T Biun−1, for n

=

1

,

2

, . . .

(28) where un

=

v

m f

w

m f

w

m t

T

is the discrete-time control signal at sample instant n. Suppose that the target motion can be approximated by a smooth motion during a sampling period. Then the target image velocity has the following relationship between two consecutive sampling instants

˙

X_it

[

n

] = ˙

X_it

[

n

−

1

]

.

(29)

Based on(28)and(29), the propagation model of the visual state estimator is given by Xn

=

I3 T I3 03 I3

Xn−1

+

T Bi 03

un−1

≡

AestXn−1

+

Bestun−1, (30) where XT n

=

(

Xi

[

n

]

)

T

( ˙

Xit

[

n

]

)

T

is the vector of system estimates at instant n, I3 is a 3-by-3 identity matrix, and 03 is a 3-by-3 zero matrix. Next, because the observed image only contains the information of target status Xi in each instant, the observation

model of VSE is given by

Zn

=

I₃ 0₃

Xn

≡

HestXn

.

(31)

Based on the propagation model (30) and observation model (31), the estimation problem in the image plane becomes to find the state estimate X∗

n that minimizes the weighted least square

distance: X_n∗

=

arg min X

[

(

Xn

−

X

)

TP −1 n

(

Xn

−

X

)

+

(

Zn

−

HestX

)

TR −1 n

(

Zn

−

HestX

)],

(32)

where Pn

=

AestPn−1ATestis the covariance matrix of propagation

model at instant n, and Rnis the covariance matrix of observation

model at instant n.

4.2. Self-tuning Kalman filter

Because the propagation model (30)and observation model (31)are both linear equations, a Kalman filter will provide the optimal estimate X∗

nbased on the performance criterion(32)when

(30)and(31)have Gaussian uncertainties [30]:

Propagation: Xn

=

AestXn∗−1

+

Bestun−1

+

δ

Xn−1 (33a) Propagation Covariance Matrix: Pn

=

AestP∗n−1A

T

est

+

Qn−1 (33b)

Observation: Z_n

=

H_estX_n

+

δ

Z_n (33c)

where

(

Xn

,

Pn

)

are the propagation state and the corresponding

covariance matrix at instant n;

(

X_n∗₋1,P∗_n₋1) are the optimal estimate and the corresponding covariance matrix at instant

n-1;

δ

XT

n

=

[

(δ

Xi

[

n

]

)

T

(δ ˙

Xit

[

n

]

)

T

_]

_∼

_N

₍

₀

_,

_Q

n

)

represents

Gaussian propagation uncertainty with zero mean and covariance matrix Q_n at instant n; and

δ

Zn

∼

N

(

0

,

Rn

)

denotes Gaussian

observation uncertainty with zero mean and covariance matrix Rn

at instant n. Based on Eqs.(33a)–(33c), the local minimum solution of performance criterion(32)and the corresponding covariance matrix at instant n are given by

X_n∗

=

X_np

+

Kn

(

Zn

−

HestXnp

)

and P ∗

n

=

(

I6

−

KnHest

)

Pn

,

(34)

where Xnp

=

AestXn∗−1

+

Bestun−1 is the ideal propagation state, Kn

=

PnHTest

(

HestPnHTest

+

Rn

)

−1is the Kalman gain matrix, and

I6is a 6-by-6 identity matrix.

Although expression(34) provides the best linear estimates at each instant, the filter performance still depends on the covariance matrices Q_nand Rn. Thus, a difficult problem in Kalman

filter applications is to determine the values of matrices Q_nand Rn for computing Kalman gain matrix Kn [31]. Moreover, the

observation uncertainty usually varies with the conditions of target motion (such as orientation and rotation of the human face) and working environment (such as light variation and occlusion); the corresponding covariance matrix Rn would be

time-varying for various operating conditions. These problems motivate us to combine a self-tuning algorithm with a Kalman filter to choose a suitable observation covariance matrix Rn in

varying environmental conditions. On the other hand, because the propagation uncertainty and the corresponding covariance matrix Q_n are difficult to estimate online, the propagation covariance matrix Q_n will be fixed at initialization without updating in this design.

The proposed self-tuning algorithm attempts to estimate the minimum variance of a set of observation data recorded over time. To do so, a linear-least-squares regression method is adopted to analyze the observed time series data [32]. The typical linear regression model for a discrete time series is given by

yn

=

an

+

b

+

ε

n

,

(35)

where the residual

ε

nis a random variable with zero mean,

(

a

,

b

)

are the parameters to be determined by minimizing the variance of residuals.Fig. 5shows the concept of the linear-least-squares regression, in which the solid line is the observed time series, and the dotted line indicates the best linear fittingy

ˆ

n

=

an

+

b with

(7)

Fig. 5. Concept of time series linear-least-squares regression.

Let k denote the length of the observed time series. Based on the linear regression model(35), the observed time series can be modeled as Y

=







1 1 2 1

... ...

k 1







θ + ε ≡

Ast

θ + ε,

(36)

where Y

= [

y1 y2

· · ·

yk

]

Tis the vector of observed data over

time, and

ε = [ε

1

ε2

· · ·

ε

k

]

Trepresents the corresponding

residuals.

θ = [

a b

]

T_{is the parameter vector to be detected such} that

θ

∗

₌

min

θ var

(ε) =

minθ

k

εk =

minθ

k

Y

−

Ast

θk ,

(37)

where var

(

x

)

is the variance of vector x, and

k

x

k

is the norm of vector x. The optimal solution of(37)will be the least-squares solution such that

θ

∗

₌

A+_stY

,

(38)

where A+_st

=

(

AT

stAst

)

−1ATstdenotes the pseudo-inverse matrix of

Ast. Substituting(38)into(36), the residual vector with minimum

variance can be obtained by

ε

∗

₌

Y

−

AstA+stY

=

(

Ik

−

Ast

(

ATstAst

)

−1AstT

)

Y

≡

TstY

,

(39)

where Tst

=

Ik

−

Ast

(

ATstAst

)

−1ATst is a fixed k-by-k coefficient

matrix, and Ik is a k-by-k identity matrix. Expression (39)

tells us that the minimum variance residual vector

ε

∗ _{is the} linear transformation of observed data vector Y through a fixed transformation matrix Tst. This observation provides us an efficient

method for detecting the minimum variance of an observed data sequence in real-time. For instance, let Xk

1, Y1k, and Dk1 denote, respectively, the observed data sequence of xi, yi, and dxover time

steps 1 to k. Using(39), the minimum variances of Xk

1, Y1k, and Dk1 are given by

σ

2 x

=

var

(

TstX1k

),

σ

2 y

=

var

(

TstY1k

),

and

σ

2 d

=

var

(

TstDk1). (40) Based on(40), the covariance matrix Rnis updated as

Rn

=

R0

+

diag

((σ

x2

)

2

, (σ

y2

)

2

, (σ

d2

)

2

),

(41)

where R0is the initial covariance matrix of Rn. Combining the

self-tuning equations(40)–(41)with Kalman filter equations(33)–(34), the implemented self-tuning Kalman filter is summarized inFig. 6. The processing steps are listed as follows:

(1) Choose two initial covariance matrices Q₀and R0, usually by a trial-and-error procedure.

(2) Assume that the initial position of target locates in the field-of-view of the camera, then initialize the estimated system state X∗

0 and propagation covariance matrix P0 by the first observation such that X∗

0

= [

Z T

0 0 0 0

]

Tand P0

=

I6. (3) Store current observed measurement in a shift register with

length k. If the length of storage data is equal to k, then compute the variance of the observed data sequences by(40)and update covariance matrix Rnby(41); else set Rn

=

R0; go to step (4). (4) Compute the ideal propagated state Xnpdefined in(32)and the

corresponding propagation covariance matrix Pnusing(33b).

(5) If the target is detected in the observed image, then compute the Kalman gain matrix Kn and update the estimated state

vector X_n∗with the corresponding covariance matrix P∗_nusing (34); else set X∗ n

=

X p nand P∗n

=

Pn; go to step (6). (6) Let X∗ n−1

=

X ∗ n, P ∗ n−1

=

P ∗ nand Qn−1

=

Q0; go to step (3). 5. Simulation and experimental results

Computer simulations and several interesting experiments have been performed to validate the robustness and tracking performance of the proposed control system. In the computer

(8)

Table 1

Parameters used in the simulations and experiments.

Symbol Quantity Description

(fx,fy) (393.4, 391.8) pixels Camera focal length in retinal coordinates [28].

W 12 cm Width of the target.

D 40 cm Distance between two drive wheels.

δy 10 cm Distance between the center of robot tilt platform and the onboard camera

T 100 ms Sampling period of the control system.

Q0 diag(5, 5, 5, 20, 20, 20) Initial propagation covariance matrix

R0 diag(5, 5, 5) Initial observation covariance matrix

(¯xi, ¯yi, ¯dx) (0, 0, 35) Desired system state in the image plane.

(α1, α2, α3)s (1/2, 3/4, 1/3) Positive control gains of simulation.

(α1, α2, α3)1 (5/4, 3, 1/2) Positive control gains of experiment. (zm

f,x m f, θ

m

f , φ)|t=0 (0, 0, 0, 0) Initial pose of tracking robot.

Fig. 7. Simulation setup for the robustness and performance evaluation of the visual tracking control system.

simulation, MATLAB was used to verify the robustness of the proposed visual tracking control system against the parametric uncertainties. An experiment was performed on an experimental mobile robot to validate the tracking performance and robustness against the occlusion uncertainty. The experiment adopts the proposed control system to control a mobile platform with the tilt function of the camera platform.Table 1shows the parameters used in the simulations and the experiment. Note that the processing time of the proposed visual tracking control system is less than 50 ms including face detection, estimation and control computations. This means the overall tracking system is of low computation load and can track the user’s face in real time. However, the sampling period of the control system T was set to 100 ms in the experiments due to other image processing computations such as image compression and storage.

5.1. Simulation setup

Fig. 7shows the simulation setup for the evaluation of system robustness and tracking performance. InFig. 7, Xndenotes the ideal

state needed to be estimated by the VSE presented in Section4.

Xi

[

n

]

is the ideal system state at time instance n, andX

˙

itis the ideal

target image velocity at time instance n. The observation signal Zn

is obtained by the rounding off the value of Xi

[

n

]

with random noise

(RN) to an integer. The random noise used in this paper is given by

RN

=

σ

_n

(

2

ω −

1

),

(42)

where

σ

n

∈

[0

,

10] is a constant noise gain, and

ω ∈

[0

,

1] is a

random noise signal with uniform distribution. Next, the VSE aims to filter RN and provide the optimal estimates. The performance of the VSE is then validated by mean-squared-error (MSE) criterion between the ideal signal Xnand the estimated signal Xn∗.

In order to validate the robustness of the VTC(13)against the parametric uncertainty, a random variable is utilized to control the

variation of the system parameters

(

fx

,

fy

,

W

, δ

y

)

such that

¯

f_x

=

(

1

+

ρ)

f_x

,

_f

¯

y

=

(

1

+

ρ)

fy

,

W

¯

=

(

1

+

ρ)

W

,

and

δ¯

y

=

(

1

+

ρ)δ

y

,

(43)

where

ρ ∈ [−

0

.

1

,

0

.

1

]

is a random variable with uniform distribution introduced in the practical system parameters (f

¯

x,f

¯

y,

¯

W ,

δ¯

y) that will be used to calculate the practical robot image

JacobianB

¯

_i. In the simulation, the motion of the target is set as a dynamic and smooth motion with velocity

(v

x f

, v

y f

, v

z f

) = (v

t fsin

θ

t f

,

0

, v

t fcos

θ

t f

),

(44) where

v

t

f

(

n

) = v

ft

(

0

)

cos

(

n

π

T

/

20

) (

cm/s

)

for

v

tf

(

0

) =

15 and

θ

t

f

(

n

) = θ

ft

(

n

−

1

) +

T

(

5

π/

72

)

rad for

θ

ft

(

0

) =

0. Expression(44)

will be used to calculate the ideal target image velocityX

˙

t i.

5.2. Simulation results of the proposed self-tuning Kalman filter

Two visual state estimators are used to compare the perfor-mance: the conventional Kalman filter (KF) and the proposed self-tuning Kalman filter (STKF). Fig. 8 shows the evolution of the average MSE measurements as the noise gain

σ

nincreased in the

simulations. Note that each average MSE measurement is out of 20 simulations for each

σ

n, and the value of the random variable

ρ

is randomly chosen in the beginning of each simulation. InFig. 8, it is clear that the estimation results of the KF are very sensitive to the intensity of the observation noise. As the noise gain

σ

n

in-creased from 0 to 10 with interval 1, the MSE measurements of the KF increase much faster than those of STKF. Further, when the noise gain

σ

n

=

10 (the observation signal has the largest noise

in-tensity), the proposed STKF provides improved estimation results compared with the KF.

(9)

Fig. 8. MSE measurements of the simulation results using Kalman filter and self-tuning Kalman filter. (a) Average MSE measurements of system state (xi, yi, dx). (b) Average MSE measurements of target image velocity (x˙t

i,˙y t i,d˙tx).

Table 2

Average MSE measurements of computer simulations.

MSE Value KF STKF MSE Value KF STKF

xi σn=10 16.7670 8.8669 x˙ti σn=10 62.6014 29.8525

σn=0 0.0442 0.0441 σn=0 8.3137 8.5533

MSE gap 16.7228 8.8228 MSE gap 54.2878 21.2992

yi σn=10 16.6215 6.9174 y˙ti σn=10 60.9062 8.0422

σn=0 0.0441 0.0442 σn=0 0.4667 0.5341

MSE gap 16.5774 6.8732 MSE gap 60.4395 7.5081

dx σn=10 16.5441 6.3912 d˙tx σn=10 59.0913 6.6815

σn=0 0.0981 0.0964 σn=0 0.2977 0.3132

MSE gap 16.4460 6.2948 MSE gap 58.7935 6.3683

Table 2records the MSE gap between

σ

n

=

10 and

σ

n

=

0.

A small MSE gap implies a large robustness against the intensity of observation noise. InTable 2, the bold font denotes the smallest value of MSE measurement across each row.Table 2shows that the MSE gaps of the KF for all estimates are larger than that of STKF. In other words, the proposed STKF provides high robustness against the observation uncertainty compared with the KF. Therefore, the simulation results validate the robustness of the proposed STKF. Remark 3. Because the effect of estimation error is not considered in the current control design, a large estimation error will degrade the tracking performance of the visual tracking control system. In other words, a robust VSE that provides a small estimation error is helpful in improving the tracking performance. The results in Table 2show that the proposed STKF provides better estimation results than the conventional KF does. Therefore, the proposed STKF is more helpful to the proposed VTC when the observation is perturbed with random noise.

5.3. Simulation results of robustness to the system parametric uncertainty

Fig. 9presents the computer simulation results of the visual tracking control system with random noise defined in (42)and parametric uncertainty defined in(43). The results shown inFig. 9 are obtained from the average of 220 simulations (20 simulations for each

σ

nand one random value

ρ

for each simulation).Fig. 9(a)

and (b) show, respectively, the tracking errors and the estimates of target image velocity. InFig. 9(a) and (b), the dotted lines illustrate the ideal values while the solid lines show the estimation results. It is clear that all tracking errors converge to zero, and each estimate converges to the corresponding ideal value.Fig. 9(c) shows the control velocities of the mobile robot and the tilt camera.

Fig. 9(d) shows the transition of f

(

u

)

defined in(15)in order to clarify the stability of the closed-loop visual tracking control system. In the simulation, the uncertainty function

δ

f

(

u

)

defined in(22)is computable since

δ ˙

X_itand

δ

Biare known. Thus, according

to(24), a low-bound (LB) of f

(

u

)

can be obtained by the following equation:

LB

=

λmin(

Kg

) k

Xe

k

2

−

δ

f

(

u

) ≤ ¯

f

(

u

) − δ

f

(

u

) =

f

(

u

).

(45)

Expression(45)tells us that if LB is positive, then f

(

u

)

is guaranteed to be positive and the system is stable. InFig. 9(d), the dotted line shows the transition of LB defined in(45)while the solid line indicates the transition of f

(

u

)

. It is clear that LB is positive during the visual tracking task, and hence the closed-loop visual tracking control system with parametric uncertainty is stable. Therefore, these simulation results validate that the proposed visual tracking control system not only overcomes the random noise in the observation, but also overcomes the parametric uncertainty in the system model.

5.4. Experiment setup

Fig. 10shows the experimental mobile robot equipped with an on-board 1.6G industrial personal computer (IPC), USB 2.0 camera and a pan-tilt camera platform for the study of human–robot interaction through visual tracking control.Fig. 11illustrates the complete robust visual tracking control system constructed using the proposed VSE and VTC. The function of each block shown in Fig. 11is listed below:

(1) Feature detection and tracking: perform face detecting and tracking algorithms proposed in [33] to extract the state of the user’s face

xi yi dx

T

in the image captured from the camera.

(10)

Fig. 9. The computer simulation results of the visual tracking control system with random noise and parametric uncertainty. (a) Tracking errors. (b) Estimated target image

velocity. (c) Control velocities of the mobile robot and the tilt camera. (d) Transition of f(u)defined in(15)and its corresponding low-bound defined in(45).

Fig. 10. Experimental mobile robot interacts with a user using a real-time face

tracking algorithm and the proposed robust visual tracking control system.

(2) Visual state estimator: estimate the optimal system state

x∗_i y∗_i d∗_x

T

and the target image velocity

˙

xt_i y

˙

t_i d

˙

t_x

T

by using the proposed self-tuning Kalman filter described in Section4.

(3) Visual tracking control law: compute desired robot control velocity

v

fm

w

fm

w

mt

T

using(13).

(4) Velocity transformation: transform the desired linear and angular control velocities into desired left- and right-wheel

control velocities using

v

l

=

v

fm

−

(

D

· w

mf

)/

2 and

v

r

=

v

m

f

+

(

D

· w

mf

)/

2, where D represents the distance between

two drive wheels.

(5) Scaling processing: scale the desired control velocity

[

v

_l

v

_r

w

m t

]

T

to satisfy the maximum velocity and acceleration limitations of the specific robot system. Note that this processing will de-grade the tracking performance, but increase the smoothness and safety, of the practical robot system.

(6) Quantization processing: quantize the scaled control velocity

˜

_v

_l

v

˜

_r

w

˜

m_t

T

dependent on the resolution of motion control module. The resolution of the motion control card used in the experiments is 8-bit, which means it can command the linear wheel velocity from

−

128 to127 cm/s in integer. For example, suppose that the scaled left-wheel velocity command

˜

v

lis 2.9925 cm/s. After quantization processing, the quantized

velocity command

v

¯

_l is

b

2

.

9925

c

=

2 cm/s, where

b

x

c

is the largest integer smaller than x, and the corresponding quantization error

δv

lis

v

¯

l

− ˜

v

l

= −

0

.

9925 cm/s.

(7) Robust control law: compute robust control velocity

[ ¯

v

_l∗

v

¯

_r∗

¯

w

m∗

t

]

Tto overcome the velocity quantization problem by using

the robust control law presented in [25]. The interested reader is referred to [25] for more technical details.

(8) Velocity inverse transformation: transfer the robot’s current left- and right-wheel velocities into linear and angular velocities using

v

c

=

(v

lc

+

v

rc

)/

2 and

w

c

=

(v

rc

−

v

lc

)/

D.

Let un−1

=

v

c

w

c

w

tc

T

denote the previous robot control velocity for the visual state estimator to calculate the current propagation system state Xnp.

(11)

Fig. 11. Implemented robust visual tracking control system of a wheeled mobile robot with on-board tilt camera.

Fig. 12. Experimental results. (a1–a7): Image sequence recorded from a DV camera. (b1–b7): Corresponding image sequence recorded from on-board USB camera.

(c–e): Recorded tracking errors in the image plane. (f–h): Target image velocity estimates. (i–j): Command linear and angular velocities of the mobile robot. (k): Command velocity of the tilt camera.

5.5. Experimental results of robustness in visual tracking

Fig. 12 presents the recorded images and responses of the mobile robot and tilt camera in the experiment that includes occlusions to validate the robustness of the proposed visual tracking control system.Fig. 12(a1–a7) show the recorded pictures from a digital video (DV) camera, and Fig. 12(b1–b7) are the corresponding pictures recorded by the on-board USB camera. Fig. 12(c–e) andFig. 12(f–h) depict the response of the tracking

errors (xe, ye, de) and target image velocity estimates (x

˙

ti,y

˙

ti,d

˙

tx),

respectively.Fig. 12(i–k) illustrate the response of robot and tilt camera control velocities (

v

m

f ,

w

fm,

w

tm).

In the beginning, the user statically sat on a stool, and the robot started to track his face using the proposed visual tracking control system. FromFig. 12(f–h), one can see that the target image velocity estimates all approach to zero when robot started working for about 5 s. Next, the user stood up (Fig. 12(a2)) and the tilt camera worked to keep tracking his face. FromFig. 12(g),