Contents lists available atScienceDirect
Robotics and Autonomous Systems
journal homepage:www.elsevier.com/locate/robot
Robust visual tracking control system of a mobile robot based on a dual-Jacobian
visual interaction model
Chi-Yi Tsai
a,
Kai-Tai Song
a,∗,
Xavier Dutoit
b,
Hendrik Van Brussel
b,
Marnix Nuttin
b aDepartment of Electrical and Control Engineering, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan, ROCbDepartment of Mechanical Engineering, Division PMA, K. U. Leuven, Celestijnenlaan 300B, B-3001 Leuven, Belgium
a r t i c l e i n f o Article history:
Received 26 March 2007 Received in revised form 6 January 2009 Accepted 7 January 2009 Available online 21 January 2009 Keywords:
Visual tracking control Visual estimation Visual interaction model Kalman filter
a b s t r a c t
This paper presents a novel design of a robust visual tracking control system, which consists of a visual tracking controller and a visual state estimator. This system facilitates human–robot interaction of a unicycle-modeled mobile robot equipped with a tilt camera. Based on a novel dual-Jacobian visual interaction model, a robust visual tracking controller is proposed to track a dynamic moving target. The proposed controller not only possesses some degree of robustness against the system model uncertainties, but also tracks the target without its 3D velocity information. The visual state estimator aims to estimate the optimal system state and target image velocity, which is used by the visual tracking controller. To achieve this, a self-tuning Kalman filter is proposed to estimate interesting parameters and to overcome the temporary occlusion problem. Furthermore, because the proposed method is fully working in the image space, the computational complexity and the sensor/camera modeling errors can be reduced. Experimental results validate the effectiveness of the proposed method, in terms of tracking performance, system convergence, and robustness.
© 2009 Elsevier B.V. All rights reserved.
1. Introduction
An intelligent robot uses its on-board sensors to collect information from the surroundings and react to the changes of its immediate environment. In recent years, vision systems have been widely used for various intelligent robots, and the research on visual tracking control has gained increasing attention in the area of robotic research [1–25]. In robotics, visual tracking control means vision-based robot motion control to track an interesting target. Based on the motion constraints of the robot, visual tracking control can be classified into two categories: visual servoing for holonomic manipulators and visual tracking for nonholonomic mobile robots. Although visual servoing of holonomic manipulators has been discussed extensively and many results can be found in the literature [1–3], mobile robots are commonly nonholonomic and the visual servoing results for holonomic manipulators are unsuitable for a mobile platform [4].
This paper addresses the problem of visual tracking control of unicycle-modeled or usually termed as wheeled mobile robots equipped with an on-board monocular vision system. Due to large number of mobile robot visual tracking control methods, we
∗Corresponding author. Tel.: +886 3 5731865; fax: +886 3 5715998.
E-mail addresses:[email protected](C.-Y. Tsai),
[email protected](K.-T. Song),[email protected]
(X. Dutoit),[email protected](M. Nuttin).
classify the reported methods into four groups based on the type of the target to be tracked. Many efforts focus on the first group which aims to track a static target, such as a ground line, landmark, or reference image, for the purpose of mobile robot navigation or regulation (so-called homing) [5–17]. For a mobile robot to track a ground line, Ma et al. formulated the visual tracking control problem as controlling the shape of a ground curve in the image plane and proposed a closed-loop vision-guided control system for a nonholonomic mobile robot [5]. Coulaud et al. proposed a simple and stable feedback controller design, which avoids sophisticated image processing and control algorithms, for a mobile robot equipped with a fixed camera to track a line on the ground [6]. In the case of tracking the landmark, the reported controllers usually modify the visual servoing technique to satisfy the nonholonomic constraint for the motion control of the mobile robot [7–10]. In [11], Zhang and Ostrowski utilized an optimal control method to solve the visual motion-planning problem by generating a virtual trajectory in the image plane and the corresponding optimal control signals for the robot to follow. Nierobisch et al. proposed a visual tracking control method for a mobile robot with a pan-tilt camera to track visual reference landmarks in the acquired views during autonomous navigation [12]. Recently, the homography-based [13,14] and epipolar-based [4,15–17] visual tracking control approaches were proposed for a mobile robot equipped with a pinhole or an omni-directional (so-called central catadioptric) camera to track a reference image toward a desired configuration. These two approaches consider the mobile robot visual tracking
0921-8890/$ – see front matter©2009 Elsevier B.V. All rights reserved.
control problem as a visual servoing regulation or visual homing problem. In [13], Chen et al. developed a visual tracking controller based on the Euclidean homography to track a desired time-varying trajectory defined by a prerecorded image sequence of a stationary target viewed by the on-board camera as the mobile robot moves. However, the stability of their result is restricted by the non-zero reference velocity condition of the desired trajectory. To overcome this drawback, Fang et al. exploited Lyapunov-based techniques to construct a homography-Lyapunov-based visual servoing regulation controller for proving asymptotic regulation of the mobile robot [14]. In [15], Mariottini et al. exploited the epipolar geometry defined by the current and desired camera views to develop a two-step visual servoing regulation controller. They also extend this design to the visual servoing regulation control of a mobile robot with a central catadioptric camera [16]. In [17], Goedemé et al. developed a vision-only navigation and homing system for mobile robots with an omni-directional camera. Their method divides the visual homing operation into two phases and computes visual homing vector based on epipolar geometry estimation. Although these approaches of the first group provide appropriate solutions for static target visual tracking control problem, they cannot guarantee to solve the moving (non-static) target visual tracking control problem.
The second group aims to track other robot teammates in a robot team for the formation control purpose [18,19]. The proposed approaches in this group usually are based on a central catadioptric camera model in order to detect all robot teammates at the same instant. The subject of the third group is to track a predictable moving target, such as a projectile or straight moving ball, for mobile robot interception purpose [20,21]. In [20], Borgstadt et al. utilized a human vision-based strategy to guide a mobile robot to intercept a projectile ball. Similarly, Capparella et al. extended the concept of human-like strategy to develop a vision-based two-level interception approach, which contains a lower level controller to control the on-board pan-tilt camera and a higher lever controller to operate the mobile robot platform, for intercepting a straight moving ball [21]. A common point of the second and third group is that the motion of the interesting target is known and predictable. However, in many robotic applications, a mobile robot requires to track a dynamic and unpredictable motion target, such as a human’s face, for the purpose of pursuit or interaction. Thus, the existent methods of the aforementioned two groups are not suitable to solve the dynamic moving target visual tracking control problem.
The purpose of the fourth group is to solve the problem of tracking a dynamic moving target [22–25]. In [22], Wang et al. proposed an adaptive backstepping control law based on an image-based camera-target visual interaction model to track a dynamic moving target with unknown height parameter. Although the approach in [22] guarantees the asymptotic stability of a closed-loop visual tracking control system in tracking a dynamic moving target, the case of tracking a static target cannot be guaranteed due to the non-zero restrictions on the reference velocity of the mobile robot. In [23], Malis et al. integrated template-based visual tracking algorithms and model-free vision-based control techniques to build a flexible and robust visual tracking control system for various robotic applications. Because their visual tracking result is based on the homography estimation, which requires two images of the target pattern to estimate the optimal homography, the reported system only overcomes the partial occlusion problem but fails in the fully occlusion problem. In [24], Han et al. proposed an image-based visual tracking control scheme for a mobile robot to estimate the position of the target in the next image and track the target to the central area of the image. Since their method utilized the differential approximation method to estimate the velocity of the target in the image plane, the estimation result is very
sensitive to image noise. Recently, a visual interaction controller had been proposed for a unicycle-modeled mobile robot to track a dynamic moving target such as a human’s face [25]. The drawback of this method is that the controller requires the target’s 3D motion velocity, which is difficult to estimate when only a monocular camera is used.
From the literature survey, we note that challenges in the mobile robot visual tracking control design is to develop a robust tracking control system to estimate the motion of the moving target and to track the target based on a stability criterion. This problem motivates us to derive a new model for developing a robust visual tracking control system to solve the tracking problem of dynamic moving target in the image plane directly and efficiently. To do so, the visual interaction model described in [25] is extended to derive a novel dual-Jacobian visual interaction model for designing a robust mobile robot visual tracking control system, which encompasses a visual state estimator and a visual tracking controller. The visual state estimator is constructed by a real-time self-tuning Kalman filter and aims to estimate the optimal system state and target motion in the image plane directly for later use by the visual tracking controller. The visual tracking controller then calculates the robot’s control velocities in the image plane directly. The main differences between the proposed method and other existent approaches are summarized as follows: (1) The proposed dual-Jacobian visual interaction model considers
not only the effect of mobile robot motion, but also the effect of target motion. Thus, based on the proposed model, the visual tracking control problem of a unicycle-modeled mobile robot for tracking a dynamic moving target can be solved with asymptotic convergence using a single controller. Moreover, the proposed model also considers the kinematics of a tilt camera platform mounted on the mobile robot. Therefore, the applicability of the proposed method is greatly increased. (2) The proposed visual tracking control system not only possesses
some degree of robustness against the system model uncer-tainties, but also overcomes the unmodeled quantization ef-fect in the velocity commands and the occlusion efef-fect during visual tracking process. This advantage enhances the reliability of the proposed method in practical applications.
(3) The proposed visual tracking control system works fully in the image space. Therefore, compared with position-based [23], homography-based [13,14], and epipole-based [4, 15–17] visual tracking control approaches, the computational complexity and the sensor/camera modeling errors can be much reduced due to the advantages of image-based visual servo control [2].
(4) The proposed self-tuning Kalman filter can automatically choose a suitable observation covariance matrix in varying en-vironmental conditions. This helps to improve the estimation performance when there is noise in the system observation. To validate the performance and robustness of the proposed control system, computer simulation and experimental studies of tracking a moving target have been conducted. Simulation and experimental results will be presented and discussed to verify the effectiveness of the proposed control system, in terms of tracking performance, system convergence, and robustness. Note that a brief version of the research results has been published in [26]. This paper will present the complete design of the proposed tracking control system, including robustness analysis, computer simulation and experimental validation.
The rest of this paper is organized as follows. Section 2 describes the proposed dual-Jacobian visual interaction model. Section3presents the results of visual tracking controller design. Section 4 develops the visual state estimator using Kalman filter with self-tuning algorithm. Simulation and experimental results are reported in Section5. Extended discussion of several interesting observations will be presented. Section6concludes the contributions of this paper.
Fig. 1. (a) A model of the unicycle-modeled mobile robot and target in the world coordinate frame. (b) Side view of the mobile robot with a tilt camera mounted on top of
its head to track a dynamic target.
2. Dual-Jacobian visual interaction model
As shown inFig. 1, the considered system is a unicycle-modeled mobile robot equipped with a tilt camera mounted on top of it to track a moving target, such as a human face, in the image plane. The robot can keep the target in the camera’s field of view such that the optical axis of this camera faces the interested target.Fig. 1(a) illustrates the model of the unicycle-modeled mobile robot and target in the world coordinate frame Ff, in which the motion of the
target is supposed to be holonomic.Fig. 1(b) is the side view of the scenario under consideration, in which the tilt angle
φ
gives the relationship between camera coordinate frame Fcand the mobilecoordinate frame Fm. The kinematics of the unicycle-modeled
mobile robot and the target can be described [27], respectively, by
˙
zfm˙
xmf˙
θ
m f˙
φ
m f
=
cosθ
fm 0 0 sinθ
fm 0 0 0 1 0 0 0 1
v
m fw
m fw
m t
,
˙
ymf=
0 and
˙
zft˙
xtf˙
ytf
=
v
z fv
x fv
y f
,
(1)where
(
xmf,
ymf,
zmf)
and(
xtf,
ytf,
zft)
are, respectively, the positions of the mobile robot and the target in Cartesian coordinates.(θ
mf
, φ)
are the orientation angle of the mobile robot and the tilt angle of the onboard camera.
(v
mf, w
fm)
andw
tmare, respectively, the linear and angular velocity of the mobile robot and the tilt velocity of the camera.(v
xf
, v
yf
, v
fz)
are the target velocity in Cartesian coordinates.In order for the mobile robot to interact with the target in the image coordinate frame, a visual interaction model was proposed in authors’ previous work [25].Fig. 2shows the definition of the observed system state in the image plane used for the visual interaction model. InFig. 2, xiand yiare, respectively, the
horizontal and vertical position of the centroid of target in the image plane, and dxis the width of target in the image plane. Let
Xi
= [
xi yi dx]
Tdenote the system state in the image plane,u
= [
v
fmw
fmw
mt]
Tis the control velocity of the mobile robot and on-board tilt camera,(
fx,
fy)
represent fixed focal length alongthe image x-axis and y-axis, respectively [28], and W stands for the actual width of the target. The visual interaction model between robot and target in the image coordinate frame can be derived as inBox I, where diag() denotes a 3-by-3 diagonal matrix, kx
=
dx/
Wand ky
=
kxfy/
fxare two scalars, andδ
y is the distance between theFig. 2. Definition of the observed system state in the image plane.
center of robot tilt platform and the onboard camera. A detailed derivation of the visual interaction model(2) given inBox I is presented inAppendix.
The visual interaction model(2)given inBox Iindicates that the elements of system matrix Aiand vector Ciare functions of the
target velocity. Thus, expression(2)given inBox Ican be rewritten as in Box II, and Vt
=
[
v
xfv
y f
v
z
f
]
T is the vector of targetvelocities in Cartesian coordinates. Expression(3)given inBox II shows that the visual interaction model consists of two parts: first, the effect of target motionX
˙
it≡
˙
xti y˙
ti d˙
txT
=
JtVt, and second,the effect of mobile robot motionX
˙
mi
≡
˙
xmi˙
ymi d˙
mxT
=
Biu.Thus, the Eq.(3)given inBox IIcan be rewritten as a dual-Jacobian equation such that
˙
Xi
= ˙
Xit+ ˙
X mi
=
JtVt+
Biu,
(4)where matrix Jt, termed as target image Jacobian, transfers the target velocity Vt into target image velocityX
˙
it; matrix Bi, whichdenotes robot image Jacobian, transfers the mobile robot control velocity u into robot image velocityX
˙
mi . In other words, the image
velocityX
˙
i is caused by the combination of target image velocity˙
Xt
i and robot image velocityX
˙
im.Fig. 3shows the concept ofdual-Jacobian equation(4). Therefore, the visual interaction between robot and target in the image coordinate frame can be modeled as a dual-Jacobian visual interaction model(4).
˙
Xi
=
AiXi+
Biu+
Ci (2)where Ai
=
diag(
A1,A2,A1), A1= −
fx−1kx(v
fxcosφ
sinθ
m f+
v
y fsinφ + v
z fcosφ
cosθ
m f)
A2= −
fy−1ky(v
xfcosφ
sinθ
m f+
v
y fsinφ + v
z fcosφ
cosθ
m f)
Bi=
fx−1kxxicosφ
f −1 x(
x2i+
fx2)
cosφ −
f −1 y fx(
kyδ
y+
yi)
sinφ
−
f −1 y xi(
kyδ
y+
yi)
ky
(
sinφ +
fy−1yicosφ)
fx−1fyxi(
sinφ +
fy−1yicosφ)
−
fy−1(
y2 i
+
f 2 y+
kyyiδ
y)
fx−1kxdxcosφ
fx−1xidxcosφ
−
fy−1dx(
kyδ
y+
yi)
Ci=
kx(v
zfsinθ
fm−
v
xfcosθ
fm)
ky(v
fycosφ − v
x fsinφ
sinθ
m f−
v
z fsinφ
cosθ
m f)
0
Box I.˙
Xi=
(
AiXi+
Ci) +
Biu=
JtVt+
Biu,
(3) where Jt=
−
kx(
fx−1xicosφ
sinθ
fm+
cosθ
mf
)
−
kxfx−1xisinφ
−
kx(
fx−1xicosφ
cosθ
fm−
sinθ
m f)
−
ky(
fy−1yicosφ
sinθ
fm+
sinφ
sinθ
mf
) −
ky(
fy−1yisinφ −
cosφ) −
ky(
fy−1yicosφ
cosθ
fm+
sinφ
cosθ
m f)
−
kxfx−1dxcosφ
sinθ
fm−
kxfx−1dxsinφ
−
kxfx−1dxcosφ
cosθ
fm
Box II.
Fig. 3. Depicts the concept of dual-Jacobian visual interaction model(4).
Similar to the human’s visual tracking behavior, the purpose of visual tracking control design aims to control the centroid position and width of target from an initial state to the desired state in the image plane. To achieve this, an error state model is introduced to design the tracking controller. Define the error coordinates in the image plane such that
Xe
=
xe ye deT
= ¯
Xi−
Xi∗=
¯
xi−
x∗i y¯
i−
y∗i d¯
x−
d∗xT
,
(5) whereX¯
i=
¯
xi y¯
i d¯
xT
is the desired state in the image plane;
Xi∗
=
x∗i y∗i d∗xT
is the estimated system state from a visual state estimator (VSE, see Section4). If the system state converges to the desired state, the visual tracking control problem is solved. Furthermore, the dynamic error state model in the image plane can be derived directly by taking the derivative of(5). The result is given by˙
Xe
= − ˙
Xit− ˙
X mi
= −
JtVt−
Biu.
(6)With the new coordinate Xe, the visual tracking control problem is
transformed into a stability problem. In the rest of this paper, the new coordinate Xewill be used to solve the visual tracking control
problem. If Xeconverges to zero, then the visual tracking control
problem is solved.
3. Visual tracking controller (VTC) design
In this section, a visual tracking control law is derived based on the error state model(6)to interact with an interesting target in the image plane by exploiting feedback linearization and pole placement techniques. The robustness analysis of the proposed controller against parametric uncertainties is also presented.
3.1. Feedback linearization and pole placement
According to the dynamic error state model (6), a feedback control law can be obtained by feedback linearization such that
u
=
B−i1(
KgXe−
JtVt) =
B −1i
(
KgXe− ˙
Xit),
(7)where Kgis an 3-by-3 gain matrix. Substituting(7)into(6)yields
˙
Xe
= −
KgXe.
(8)It is clear that if all eigen values of matrix
−
Kgare constant and lieinside the left-half complex plane, then the system error state Xe
(
t)
will decay exponentially to zero. Thus, we choose the gain matrix by pole placement such that
Kg
=
diag(α1, α2, α3),
(9)in which
(α1, α2, α3)
are three positive constants. Substituting(9) into(8)yields˙
Xe
= −
KgXe= −
diag(α1, α2, α3)
Xe.
(10)Suppose that the initial error state Xe
(
t0)is within the image plane.Then expression(10)indicates that
Xe
(
t)|
t0,Xe(t0)≡
Xe(
t;
t0,Xe(
t0))=
diag(
e−α1(t−t0),
e−α2(t−t0),
e−α3(t−t0))
Xe
(
t0)for some t0
≥
0.
(11)Because
(α1, α2, α3)
are three positive constants, expression(11) leads to the following inequality:k
Xe(
t;
t0,Xe(
t0))k ≤e−λmin(Kg)(t−t0)k
Xe(
t0)k for all t≥
t0, (12)where
λmin(
A)
denotes the minimum eigenvalue of matrix A, andk
Bk
denotes the 2-norm value of vector B. From(12), it is clear that the system error state satisfiesk
Xe(
t;
t0,Xe(
t0))k ≤ kXe(
t0)kand limt→∞
k
Xe(
t;
t0,Xe(
t0))k = 0, and thus the visual trackingcontrol problem is solved. Summarizing the above discussions, we obtain the following proposition.
Proposition 1. Suppose that the initial system state Xi is within
the image plane. Let
(α1, α2, α3) >
0 be three positive constants.Consider the dual-Jacobian visual interaction system(4). If the matrix
Bi is nonsingular, then the closed-loop visual tracking system(6)is
asymptotically stable by using the control law
u
=
B−i 1(
KgXe− ˙
Xit),
(13)whereX
˙
ti
=
JtVtis the target image velocity defined in(4), matricesBiand Jtare the robot and target image Jacobian defined in(2)and
(3), respectively, Xeis the system error state defined in(5), and Kgis
a 3-by-3 diagonal gain matrix defined in(9).
Proof. Consider the closed-loop visual tracking system(6). We first define a positive-definite Lyapunov function associated with the system error state
V
(
xe,
ye,
de) =
1 2(
x 2 e+
y 2 e+
d 2 e). (14)Taking the derivative of(14)yields
˙
V=
XeTX˙
e= −
(
XeTJtVt+
XeTBiu) = −
XeT( ˙
X t i+
Biu) ≡ −
f(
u),
(15) where f(
u) =
XT e( ˙
X ti
+
Biu)
. In view of Lyapunov theory [29],expression(15)tells us that if f
(
u) >
0 then the equilibrium point of(6)is asymptotically stable. Substituting the control law(13)intof
(
u)
, we then havef
(
u) =
XeTKgXe,
(16)where Kg
=
diag(α1, α2, α3)
, and(α1, α2, α3)
are three positiveconstants. Since Kgis a symmetric positive definite (SPD) matrix,
the following inequality holds:
f
(
u) ≥ λmin(
Kg) k
Xek
2=
min(α1, α2, α3) k
Xek
2>
0,
(17)where
λmin(
A)
denotes the minimum eigenvalue of matrix A. Expression (17) concludes that the closed-loop visual tracking system(6)is asymptotically stable and hence completes the proof.
3.2. Singularity analysis
The feedback linearization control law(13)poses a singularity problem of matrix Bi. In [25], the singularity condition of matrix Bi
is derived such that
fy
=
(
yi+
Sdx)
tanφ,
(18)where S
=
(
fyδ
y)/(
fxW)
is a fixed scalar factor. Let(
xc,
yc,
zc)
represents the related position between robot and target in the camera coordinate frame. Because of fy
=
kyzc, yi=
kyyc, anddx
=
kxW (see(A.2)inAppendix), singularity condition(18)canbe rewritten such that
zc
=
(
yc+
δ
y)
tanφ,
(19)where zcis the distance from the camera to the target.Fig. 4shows
the particular configuration of the singular condition(19). InFig. 4, the distance zφequals:
zφ
=
(
yc+
δ
y)
tanφ.
(20)From(19)and(20), it is clear that the physical meaning of the singularity condition(19)is that the distance between the camera and the target equals to the distance zφ. In general, this situation usually will not happen unless the target directly locates on Ym-axis
(the Y -axis of the mobile coordinate frame Fm), which means that
the target is directly above or directly below the robot. Under such circumstances, the robot will be unable to approach the target in any way due to deficient degree-of-freedom, and the robot should stop tracking temporarily.
Fig. 4. Particular configuration of the singularity condition(10).
Remark 1. The proposed control law(13)is associated with the target’s velocity in the working space or target’s image velocity in the image plane. If the 3D velocity of the target Vt is known,
the target image velocityX
˙
ti can be calculated by using target
image Jacobian Jt without the estimation process. However, in practical applications, it is difficult to measure the 3D velocity of the target when using only one camera in real-time operations. In this situation, an estimation process is required to estimate the target image velocityX
˙
ti in the image plane directly. In Section4,
a VSE will be proposed to accomplish this task. This design will facilitate more general performance of the proposed tracking control scheme in the image plane.
3.3. Robustness analysis
In this subsection, we investigate the robustness of the pro-posed VTC(13) against model uncertainties on camera param-eters
(
fx,
fy)
, robot parameters(θ
fm, φ)
, and target parameters(
W, ˙
xti, ˙
yti, ˙
dtx)
. Consider the following closed-loop visual tracking system with parametric uncertainties:˙
Xe
= −
X˙¯
ti
− ¯
Biu= −
( ˙
Xit+
δ ˙
Xit) − (
Bi+
δ
Bi)
u,
(21)where
δ ˙
Xti and
δ
Biare unknown bounded disturbances. Recall thepositive-definite Lyapunov function defined in(14), the derivative of(14)with parametric uncertainties becomes
˙¯
V
= −
XeT(
X˙¯
ti+ ¯
Biu) = −[
f(
u) + δ
f(
u)] ≡ −¯
f(
u),
(22)where
δ
f(
u) =
XTe
(δ ˙
Xit+
δ
Biu)
is unknown. Assume thatδ
f(
u)
isbounded and there exists a SPD matrixQ such that
˜
δ
f(
u) < k
Xek
2Q˜,
(23)where
k
Xk
A=
(
XTAX)
1/2denotes a weighted vector norm with a SPD matrix A. Now, the main result is presented as follows. Theorem 1. Consider the dual-Jacobian visual interaction system(4)with unknown bounded parametric uncertainties
δ ˙
Xti and
δ
Bidefinedin(21). Let Q
˜
>
0 be a SPD matrix defined in(23). Choose the controller u as given in the expression(13)with a constant SPD matrixKg
=
diag(α1, α2, α3) >
0. Then, the closed-loop visual trackingsystem(6)is asymptotically stable for all
α
i> λmax( ˜
Q)
, i=
1,
2,
3.Proof. From(22)and(23), it follows that
f
(
u) = ¯
f(
u) − δ
f(
u) > ¯
f(
u) − k
Xek
2Q˜.
(24) Expression(24)implies that¯
f(
u)−k
Xek
2Q˜is a lower-bound of f(
u)
. Iff¯
(
u) − k
Xek
Q2˜>
0 can be guaranteed, then f(
u) >
0 is satisfied and thus the system has the robust property w.r.t. the parametric uncertainties.Choose the controller u as in(13)with parametric uncertainties such that
u
= ¯
B−i 1(
KgXe−
X˙¯
twhereX
˙¯
ti= ¯
JtVt. Substituting(25)into(22)yields˙¯
V
= −¯
f(
u) = − k
Xek
2Kg,
(26)where Kg
=
diag(α1, α2, α3) >
0 is a constant SPD matrix. From(24)and(26), it is clear that
f
(
u) > k
Xek
2Kg− k
Xek
2˜
Q
≥ [
λmin(
Kg) − λmax( ˜
Q)] k
Xek
2
,
(27)where
λmin(
A)
is defined in (17), andλmax(
A)
denotes the maximum eigenvalue of matrix A. Expression(27)tells us that ifλmin(
Kg) − λmax
( ˜
Q)
is positive, then f(
u) >
0 is satisfied andthus the equilibrium point of (6) is asymptotically stable. This means that the proposed VTC(13)is robust against the unknown parametric uncertainties and hence completes the proof. Remark 2. In realization of the control schemes, it is worth noting that the quantization error in velocity commands degrade the performance of the controller and might make the system unstable. In order to overcome this problem, the proposed VTC (13)will be combined with a robust control law presented in [25] for improving the robustness of the visual tracking control system. Interested readers refer to [25] for more technical details. 4. Visual state estimator (VSE) design
As implied by Proposition 1, the VTC (13) requires the information of target image velocityX
˙
it. This requirement poses two questions: first, how the target image velocity can be estimated in the image plane directly; second, what estimation methods can be used. In this section, we first formulate the problem of target image velocity estimation in the image space. A VSE will then be proposed in order to compute the optimal estimates of target status Xi and target image velocityX˙
it in theweighted least squared error sense for later used by the VTC.
4.1. Problem formulation
Since actual image processing is discrete, the first step of VSE design is to discretize the system model(4)into a corresponding discrete form. By the definition
˙
x(
t) =
limT→0[
x(
t) −
x(
t−
T
)]/
T , T denotes the sampling time of the digital system, one canapproximate the system model(4)as
Xi
[
n] =
Xi[
n−
1] +
TX˙
it[
n−
1] +
T Biun−1, for n=
1,
2, . . .
(28) where un=
v
m fw
m fw
m tT
is the discrete-time control signal at sample instant n. Suppose that the target motion can be approximated by a smooth motion during a sampling period. Then the target image velocity has the following relationship between two consecutive sampling instants
˙
Xit
[
n] = ˙
Xit[
n−
1]
.
(29)Based on(28)and(29), the propagation model of the visual state estimator is given by Xn
=
I3 T I3 03 I3 Xn−1+
T Bi 03 un−1≡
AestXn−1+
Bestun−1, (30) where XT n=
(
Xi[
n]
)
T( ˙
Xit[
n]
)
Tis the vector of system estimates at instant n, I3 is a 3-by-3 identity matrix, and 03 is a 3-by-3 zero matrix. Next, because the observed image only contains the information of target status Xi in each instant, the observation
model of VSE is given by
Zn
=
I3 03Xn
≡
HestXn.
(31)Based on the propagation model (30) and observation model (31), the estimation problem in the image plane becomes to find the state estimate X∗
n that minimizes the weighted least square
distance: Xn∗
=
arg min X[
(
Xn−
X)
TP −1 n(
Xn−
X)
+
(
Zn−
HestX)
TR −1 n(
Zn−
HestX)],
(32)where Pn
=
AestPn−1ATestis the covariance matrix of propagationmodel at instant n, and Rnis the covariance matrix of observation
model at instant n.
4.2. Self-tuning Kalman filter
Because the propagation model (30)and observation model (31)are both linear equations, a Kalman filter will provide the optimal estimate X∗
nbased on the performance criterion(32)when
(30)and(31)have Gaussian uncertainties [30]:
Propagation: Xn
=
AestXn∗−1+
Bestun−1+
δ
Xn−1 (33a) Propagation Covariance Matrix: Pn=
AestP∗n−1AT
est
+
Qn−1 (33b)Observation: Zn
=
HestXn+
δ
Zn (33c)where
(
Xn,
Pn)
are the propagation state and the correspondingcovariance matrix at instant n;
(
Xn∗−1,P∗n−1) are the optimal estimate and the corresponding covariance matrix at instantn-1;
δ
XTn
=
[
(δ
Xi[
n]
)
T(δ ˙
Xit[
n]
)
T
]
∼
N(
0,
Qn
)
representsGaussian propagation uncertainty with zero mean and covariance matrix Qn at instant n; and
δ
Zn∼
N(
0,
Rn)
denotes Gaussianobservation uncertainty with zero mean and covariance matrix Rn
at instant n. Based on Eqs.(33a)–(33c), the local minimum solution of performance criterion(32)and the corresponding covariance matrix at instant n are given by
Xn∗
=
Xnp+
Kn(
Zn−
HestXnp)
and P ∗n
=
(
I6−
KnHest)
Pn,
(34)where Xnp
=
AestXn∗−1+
Bestun−1 is the ideal propagation state, Kn=
PnHTest(
HestPnHTest+
Rn)
−1is the Kalman gain matrix, andI6is a 6-by-6 identity matrix.
Although expression(34) provides the best linear estimates at each instant, the filter performance still depends on the covariance matrices Qnand Rn. Thus, a difficult problem in Kalman
filter applications is to determine the values of matrices Qnand Rn for computing Kalman gain matrix Kn [31]. Moreover, the
observation uncertainty usually varies with the conditions of target motion (such as orientation and rotation of the human face) and working environment (such as light variation and occlusion); the corresponding covariance matrix Rn would be
time-varying for various operating conditions. These problems motivate us to combine a self-tuning algorithm with a Kalman filter to choose a suitable observation covariance matrix Rn in
varying environmental conditions. On the other hand, because the propagation uncertainty and the corresponding covariance matrix Qn are difficult to estimate online, the propagation covariance matrix Qn will be fixed at initialization without updating in this design.
The proposed self-tuning algorithm attempts to estimate the minimum variance of a set of observation data recorded over time. To do so, a linear-least-squares regression method is adopted to analyze the observed time series data [32]. The typical linear regression model for a discrete time series is given by
yn
=
an+
b+
ε
n,
(35)where the residual
ε
nis a random variable with zero mean,(
a,
b)
are the parameters to be determined by minimizing the variance of residuals.Fig. 5shows the concept of the linear-least-squares regression, in which the solid line is the observed time series, and the dotted line indicates the best linear fittingy
ˆ
n=
an+
b withFig. 5. Concept of time series linear-least-squares regression.
Let k denote the length of the observed time series. Based on the linear regression model(35), the observed time series can be modeled as Y
=
1 1 2 1... ...
k 1
θ + ε ≡
Astθ + ε,
(36)where Y
= [
y1 y2· · ·
yk]
Tis the vector of observed data overtime, and
ε = [ε
1ε2
· · ·
ε
k]
Trepresents the correspondingresiduals.
θ = [
a b]
Tis the parameter vector to be detected such thatθ
∗=
minθ var
(ε) =
minθk
εk =
minθk
Y−
Astθk ,
(37)where var
(
x)
is the variance of vector x, andk
xk
is the norm of vector x. The optimal solution of(37)will be the least-squares solution such thatθ
∗=
A+stY
,
(38)where A+st
=
(
ATstAst
)
−1ATstdenotes the pseudo-inverse matrix ofAst. Substituting(38)into(36), the residual vector with minimum
variance can be obtained by
ε
∗=
Y
−
AstA+stY=
(
Ik−
Ast(
ATstAst)
−1AstT)
Y≡
TstY,
(39)where Tst
=
Ik−
Ast(
ATstAst)
−1ATst is a fixed k-by-k coefficientmatrix, and Ik is a k-by-k identity matrix. Expression (39)
tells us that the minimum variance residual vector
ε
∗ is the linear transformation of observed data vector Y through a fixed transformation matrix Tst. This observation provides us an efficientmethod for detecting the minimum variance of an observed data sequence in real-time. For instance, let Xk
1, Y1k, and Dk1 denote, respectively, the observed data sequence of xi, yi, and dxover time
steps 1 to k. Using(39), the minimum variances of Xk
1, Y1k, and Dk1 are given by
σ
2 x=
var(
TstX1k),
σ
2 y=
var(
TstY1k),
andσ
2 d=
var(
TstDk1). (40) Based on(40), the covariance matrix Rnis updated asRn
=
R0+
diag((σ
x2)
2, (σ
y2)
2, (σ
d2)
2),
(41)where R0is the initial covariance matrix of Rn. Combining the
self-tuning equations(40)–(41)with Kalman filter equations(33)–(34), the implemented self-tuning Kalman filter is summarized inFig. 6. The processing steps are listed as follows:
(1) Choose two initial covariance matrices Q0and R0, usually by a trial-and-error procedure.
(2) Assume that the initial position of target locates in the field-of-view of the camera, then initialize the estimated system state X∗
0 and propagation covariance matrix P0 by the first observation such that X∗
0
= [
Z T0 0 0 0
]
Tand P0=
I6. (3) Store current observed measurement in a shift register withlength k. If the length of storage data is equal to k, then compute the variance of the observed data sequences by(40)and update covariance matrix Rnby(41); else set Rn
=
R0; go to step (4). (4) Compute the ideal propagated state Xnpdefined in(32)and thecorresponding propagation covariance matrix Pnusing(33b).
(5) If the target is detected in the observed image, then compute the Kalman gain matrix Kn and update the estimated state
vector Xn∗with the corresponding covariance matrix P∗nusing (34); else set X∗ n
=
X p nand P∗n=
Pn; go to step (6). (6) Let X∗ n−1=
X ∗ n, P ∗ n−1=
P ∗ nand Qn−1=
Q0; go to step (3). 5. Simulation and experimental resultsComputer simulations and several interesting experiments have been performed to validate the robustness and tracking performance of the proposed control system. In the computer
Table 1
Parameters used in the simulations and experiments.
Symbol Quantity Description
(fx,fy) (393.4, 391.8) pixels Camera focal length in retinal coordinates [28].
W 12 cm Width of the target.
D 40 cm Distance between two drive wheels.
δy 10 cm Distance between the center of robot tilt platform and the onboard camera
T 100 ms Sampling period of the control system.
Q0 diag(5, 5, 5, 20, 20, 20) Initial propagation covariance matrix
R0 diag(5, 5, 5) Initial observation covariance matrix
(¯xi, ¯yi, ¯dx) (0, 0, 35) Desired system state in the image plane.
(α1, α2, α3)s (1/2, 3/4, 1/3) Positive control gains of simulation.
(α1, α2, α3)1 (5/4, 3, 1/2) Positive control gains of experiment. (zm
f,x m f, θ
m
f , φ)|t=0 (0, 0, 0, 0) Initial pose of tracking robot.
Fig. 7. Simulation setup for the robustness and performance evaluation of the visual tracking control system.
simulation, MATLAB was used to verify the robustness of the proposed visual tracking control system against the parametric uncertainties. An experiment was performed on an experimental mobile robot to validate the tracking performance and robustness against the occlusion uncertainty. The experiment adopts the proposed control system to control a mobile platform with the tilt function of the camera platform.Table 1shows the parameters used in the simulations and the experiment. Note that the processing time of the proposed visual tracking control system is less than 50 ms including face detection, estimation and control computations. This means the overall tracking system is of low computation load and can track the user’s face in real time. However, the sampling period of the control system T was set to 100 ms in the experiments due to other image processing computations such as image compression and storage.
5.1. Simulation setup
Fig. 7shows the simulation setup for the evaluation of system robustness and tracking performance. InFig. 7, Xndenotes the ideal
state needed to be estimated by the VSE presented in Section4.
Xi
[
n]
is the ideal system state at time instance n, andX˙
itis the idealtarget image velocity at time instance n. The observation signal Zn
is obtained by the rounding off the value of Xi
[
n]
with random noise(RN) to an integer. The random noise used in this paper is given by
RN
=
σ
n(
2ω −
1),
(42)where
σ
n∈
[0,
10] is a constant noise gain, andω ∈
[0,
1] is arandom noise signal with uniform distribution. Next, the VSE aims to filter RN and provide the optimal estimates. The performance of the VSE is then validated by mean-squared-error (MSE) criterion between the ideal signal Xnand the estimated signal Xn∗.
In order to validate the robustness of the VTC(13)against the parametric uncertainty, a random variable is utilized to control the
variation of the system parameters
(
fx,
fy,
W, δ
y)
such that¯
fx
=
(
1+
ρ)
fx,
f¯
y
=
(
1+
ρ)
fy,
W¯
=
(
1+
ρ)
W,
and
δ¯
y=
(
1+
ρ)δ
y,
(43)where
ρ ∈ [−
0.
1,
0.
1]
is a random variable with uniform distribution introduced in the practical system parameters (f¯
x,f¯
y,¯
W ,
δ¯
y) that will be used to calculate the practical robot imageJacobianB
¯
i. In the simulation, the motion of the target is set as a dynamic and smooth motion with velocity(v
x f, v
y f, v
z f) = (v
t fsinθ
t f,
0, v
t fcosθ
t f),
(44) wherev
tf
(
n) = v
ft(
0)
cos(
nπ
T/
20) (
cm/s)
forv
tf(
0) =
15 andθ
tf
(
n) = θ
ft(
n−
1) +
T(
5π/
72)
rad forθ
ft(
0) =
0. Expression(44)will be used to calculate the ideal target image velocityX
˙
t i.5.2. Simulation results of the proposed self-tuning Kalman filter
Two visual state estimators are used to compare the perfor-mance: the conventional Kalman filter (KF) and the proposed self-tuning Kalman filter (STKF). Fig. 8 shows the evolution of the average MSE measurements as the noise gain
σ
nincreased in thesimulations. Note that each average MSE measurement is out of 20 simulations for each
σ
n, and the value of the random variableρ
is randomly chosen in the beginning of each simulation. InFig. 8, it is clear that the estimation results of the KF are very sensitive to the intensity of the observation noise. As the noise gain
σ
nin-creased from 0 to 10 with interval 1, the MSE measurements of the KF increase much faster than those of STKF. Further, when the noise gain
σ
n=
10 (the observation signal has the largest noisein-tensity), the proposed STKF provides improved estimation results compared with the KF.
Fig. 8. MSE measurements of the simulation results using Kalman filter and self-tuning Kalman filter. (a) Average MSE measurements of system state (xi, yi, dx). (b) Average MSE measurements of target image velocity (x˙t
i,˙y t i,d˙tx).
Table 2
Average MSE measurements of computer simulations.
MSE Value KF STKF MSE Value KF STKF
xi σn=10 16.7670 8.8669 x˙ti σn=10 62.6014 29.8525
σn=0 0.0442 0.0441 σn=0 8.3137 8.5533
MSE gap 16.7228 8.8228 MSE gap 54.2878 21.2992
yi σn=10 16.6215 6.9174 y˙ti σn=10 60.9062 8.0422
σn=0 0.0441 0.0442 σn=0 0.4667 0.5341
MSE gap 16.5774 6.8732 MSE gap 60.4395 7.5081
dx σn=10 16.5441 6.3912 d˙tx σn=10 59.0913 6.6815
σn=0 0.0981 0.0964 σn=0 0.2977 0.3132
MSE gap 16.4460 6.2948 MSE gap 58.7935 6.3683
Table 2records the MSE gap between
σ
n=
10 andσ
n=
0.A small MSE gap implies a large robustness against the intensity of observation noise. InTable 2, the bold font denotes the smallest value of MSE measurement across each row.Table 2shows that the MSE gaps of the KF for all estimates are larger than that of STKF. In other words, the proposed STKF provides high robustness against the observation uncertainty compared with the KF. Therefore, the simulation results validate the robustness of the proposed STKF. Remark 3. Because the effect of estimation error is not considered in the current control design, a large estimation error will degrade the tracking performance of the visual tracking control system. In other words, a robust VSE that provides a small estimation error is helpful in improving the tracking performance. The results in Table 2show that the proposed STKF provides better estimation results than the conventional KF does. Therefore, the proposed STKF is more helpful to the proposed VTC when the observation is perturbed with random noise.
5.3. Simulation results of robustness to the system parametric uncertainty
Fig. 9presents the computer simulation results of the visual tracking control system with random noise defined in (42)and parametric uncertainty defined in(43). The results shown inFig. 9 are obtained from the average of 220 simulations (20 simulations for each
σ
nand one random valueρ
for each simulation).Fig. 9(a)and (b) show, respectively, the tracking errors and the estimates of target image velocity. InFig. 9(a) and (b), the dotted lines illustrate the ideal values while the solid lines show the estimation results. It is clear that all tracking errors converge to zero, and each estimate converges to the corresponding ideal value.Fig. 9(c) shows the control velocities of the mobile robot and the tilt camera.
Fig. 9(d) shows the transition of f
(
u)
defined in(15)in order to clarify the stability of the closed-loop visual tracking control system. In the simulation, the uncertainty functionδ
f(
u)
defined in(22)is computable sinceδ ˙
Xitandδ
Biare known. Thus, accordingto(24), a low-bound (LB) of f
(
u)
can be obtained by the following equation:LB
=
λmin(
Kg) k
Xek
2−
δ
f(
u) ≤ ¯
f(
u) − δ
f(
u) =
f(
u).
(45)Expression(45)tells us that if LB is positive, then f
(
u)
is guaranteed to be positive and the system is stable. InFig. 9(d), the dotted line shows the transition of LB defined in(45)while the solid line indicates the transition of f(
u)
. It is clear that LB is positive during the visual tracking task, and hence the closed-loop visual tracking control system with parametric uncertainty is stable. Therefore, these simulation results validate that the proposed visual tracking control system not only overcomes the random noise in the observation, but also overcomes the parametric uncertainty in the system model.5.4. Experiment setup
Fig. 10shows the experimental mobile robot equipped with an on-board 1.6G industrial personal computer (IPC), USB 2.0 camera and a pan-tilt camera platform for the study of human–robot interaction through visual tracking control.Fig. 11illustrates the complete robust visual tracking control system constructed using the proposed VSE and VTC. The function of each block shown in Fig. 11is listed below:
(1) Feature detection and tracking: perform face detecting and tracking algorithms proposed in [33] to extract the state of the user’s face
xi yi dxT
in the image captured from the camera.
Fig. 9. The computer simulation results of the visual tracking control system with random noise and parametric uncertainty. (a) Tracking errors. (b) Estimated target image
velocity. (c) Control velocities of the mobile robot and the tilt camera. (d) Transition of f(u)defined in(15)and its corresponding low-bound defined in(45).
Fig. 10. Experimental mobile robot interacts with a user using a real-time face
tracking algorithm and the proposed robust visual tracking control system.
(2) Visual state estimator: estimate the optimal system state
x∗i y∗i d∗x
T
and the target image velocity˙
xti y˙
ti d˙
txT
by using the proposed self-tuning Kalman filter described in Section4.
(3) Visual tracking control law: compute desired robot control velocity
v
fmw
fmw
mtT
using(13).
(4) Velocity transformation: transform the desired linear and angular control velocities into desired left- and right-wheel
control velocities using
v
l=
v
fm−
(
D·
w
mf)/
2 andv
r=
v
mf
+
(
D·
w
mf)/
2, where D represents the distance betweentwo drive wheels.
(5) Scaling processing: scale the desired control velocity
[
v
lv
rw
m t]
Tto satisfy the maximum velocity and acceleration limitations of the specific robot system. Note that this processing will de-grade the tracking performance, but increase the smoothness and safety, of the practical robot system.
(6) Quantization processing: quantize the scaled control velocity
˜
v
lv
˜
rw
˜
mtT
dependent on the resolution of motion control module. The resolution of the motion control card used in the experiments is 8-bit, which means it can command the linear wheel velocity from−
128 to127 cm/s in integer. For example, suppose that the scaled left-wheel velocity command˜
v
lis 2.9925 cm/s. After quantization processing, the quantizedvelocity command
v
¯
l isb
2.
9925c
=
2 cm/s, whereb
xc
is the largest integer smaller than x, and the corresponding quantization errorδv
lisv
¯
l− ˜
v
l= −
0.
9925 cm/s.(7) Robust control law: compute robust control velocity
[ ¯
v
l∗v
¯
r∗¯
w
m∗t
]
Tto overcome the velocity quantization problem by usingthe robust control law presented in [25]. The interested reader is referred to [25] for more technical details.
(8) Velocity inverse transformation: transfer the robot’s current left- and right-wheel velocities into linear and angular velocities using
v
c=
(v
lc+
v
rc)/
2 andw
c=
(v
rc−
v
lc)/
D.Let un−1
=
v
cw
cw
tcT
denote the previous robot control velocity for the visual state estimator to calculate the current propagation system state Xnp.
Fig. 11. Implemented robust visual tracking control system of a wheeled mobile robot with on-board tilt camera.
Fig. 12. Experimental results. (a1–a7): Image sequence recorded from a DV camera. (b1–b7): Corresponding image sequence recorded from on-board USB camera.
(c–e): Recorded tracking errors in the image plane. (f–h): Target image velocity estimates. (i–j): Command linear and angular velocities of the mobile robot. (k): Command velocity of the tilt camera.
5.5. Experimental results of robustness in visual tracking
Fig. 12 presents the recorded images and responses of the mobile robot and tilt camera in the experiment that includes occlusions to validate the robustness of the proposed visual tracking control system.Fig. 12(a1–a7) show the recorded pictures from a digital video (DV) camera, and Fig. 12(b1–b7) are the corresponding pictures recorded by the on-board USB camera. Fig. 12(c–e) andFig. 12(f–h) depict the response of the tracking
errors (xe, ye, de) and target image velocity estimates (x
˙
ti,y˙
ti,d˙
tx),respectively.Fig. 12(i–k) illustrate the response of robot and tilt camera control velocities (
v
mf ,
w
fm,w
tm).In the beginning, the user statically sat on a stool, and the robot started to track his face using the proposed visual tracking control system. FromFig. 12(f–h), one can see that the target image velocity estimates all approach to zero when robot started working for about 5 s. Next, the user stood up (Fig. 12(a2)) and the tilt camera worked to keep tracking his face. FromFig. 12(g),