2. Dynamic Task Assignment with Path Control for Multi-Agent System
2.7 Conclusions
In this chapter, a SOM-based FNN controller is adopted in the MAS to choose the best-matching pairs between agents and targets and perform path planning using intelligent adaptive methodology. Compared with the simple incremental path planning adopted in the traditional SOM to let the agents move toward the chosen targets, the high nonlinearities and uncertainties of the agents have been considered in this chapter. The intelligent adaptive SOM-based FNN controller is operated in conjunction with the traditional SOM to find the best paths allowing all agents to go to their final targets. The proposed main controller is the FNN controller, in which the fuzzy rule is combined into the neural network, and a new
monitoring controller is also designed to work with FNN controller. This forces the agents to go to their corresponding targets within the constraints of nonlinear dynamics and uncertainties of the agents. It is obvious that the weighting factors are updated via the Lyapunov stability constraints, a process which is very different from the simple update method used by the traditional SOM. From the simulation results, excellent path planning for all agents has been obtained via the intelligent adaptive SOM-based FNN controller.
Chapter 3
Toward a New Task Assignment and Path Evolution (TAPE) for Missile Defense System (MDS) using Intelligent Adaptive SOM with Fuzzy Neural Networks
3.1 Background and Motivation
In this thesis, we assume that there are limited N defending missiles and D incoming missiles in MDS. The D incoming missiles are launched to attack the limited assets which have their own significances. Once an asset is destroyed by some incoming missiles, it will lose its asset value (or the damaging cost). Because the number of assets that are under attack is unknown, the assignment of N defending missiles becomes important to minimize the total damaging costs (or maximize the total surviving assets). In the first part of this chapter, a one-to-one agent-target missile guidance law using fuzzy neural network is proposed in comparison with the cerebellar model articulation controller (CMAC) [46], however, the CMAC structure is too complex to be implemented in real-time environment, and the enormous weight space and limited modeling capability in CMAC can be further improved using the proposed FNN controller with fewer mappings and layers. In the second part of this chapter, an adaptive SOM with FNN controller is proposed for multi-agent-multi-target task assignment and missile guidance.
3.2 Problem Formulation
In multi-agent system (MAS) with a group of N agents in the three-dimensional workspace, we assume that the positions and angles of agents A={a1,a2,...,aN} are initially in a user defined region, and the positions and angles of targets T={t ,t ,...,t } are
initially distributed randomly in the same three-dimensional workspace. The MAS can rapidly and efficiently complete an assigned task via the control inputs U={u1,u2,...,uN} for all the agents. In the MAS, it is desired to first perform task assignment by self-organizing map (SOM), after which the path evolution is activated so that all the agents are capable of going to their corresponding targets under the agent dynamics constraints. The architecture of MAS can be extended to missile defense system (MDS), in which the defending interceptors and incoming missiles can be seen as agents and targets, respectively. Furthermore, we consider in the MDS that the assets S with different asset values and will be attacked by targets. For any asset sl in S, it contains its own asset value V(sl) denoted by V , which can be regarded as l the damaging cost when the asset is attacked and then destroyed. Thus the overall damaging cost by all the targets can be denoted by V={V1,V2,...,VD}. The main control objective is to find the minimal total damaging cost
∑
= D d Vd
1
to all the assets by SOM, which will be discussed in Section 3.5. After task assignment for all the agents, the fuzzy neural network (FNN) controller is adopted for the agents to intercept the targets. The overall concept proposed in this chapter can be illustrated in the following Fig. 3-1.
Fuzzy Neural Network Controller Monitoring Controller
Adaptive Fuzzy Neural Network (FNN) Controller
+ +
Adaptive algorithm Tracking
Error
Missile Defense System
(MDS) Self-Organizing
Map (SOM)
A
T
U
A V
Fuzzy NeuralNetwork Controller Monitoring Controller
Adaptive Fuzzy Neural Network (FNN) Controller
+ +
Adaptive algorithm Tracking
Error
Missile Defense System
(MDS) Self-Organizing
Map (SOM)
A
T
U
A V
Fig. 3-1. The overall concept of adaptive SOM with FNN controller for MDS.
Before considering multi-agent-multi-target scenarios, a new intelligent algorithm for single agent-target command line-of-sight (CLOS) guidance law will be first proposed in the following sections.
3.3 The Three-Dimensional CLOS Guidance Model
The three-dimensional CLOS guidance problem in Fig. 3-2 is a well-known guidance model [44, 46] which can be formulated as a tracking problem for a time-varying nonlinear system. The three-dimensional CLOS guidance model in [44, 46] will be repeated here for convenience. The origin of the inertial frame is located at the ground tracker. The ZI axis is vertical upward and the XI-YI plane is horizontal. The origin of the agent body frame is fixed at the agents’ center of mass, with the XA axis forward along the agent centerline. The dynamics of all the agents in the inertial frame can be represented [44] as
⎥⎥
A tracking error is defined in order to convert the CLOS guidance problem into a tracking problem. The CLOS guidance involves guiding the agent along the line-of-sight (LOS) to the target. The LOS frame is shown in Fig. 3-3 in which the origin is located at the ground tracker.
The XL axis forwards along the LOS to the target, and the YL axis is horizontal to the left of the XL-YL plane. Then, the coordinates (Rp, e1, e2) indicated in Fig. 3-3 represent the agent
position in the LOS frame, and they are related to (xa, ya, za) through rotations as follows:
Ground tracker
XA
Ground tracker
XA
Fig. 3-2. Three-dimensional agent-target pursuit diagram [44, 46].
Ra
Ground tracker
XL
Ground tracker
XL
Fig. 3-3. Definition of tracking error [44, 46].
The tracking error is defined as e = [e1, e2]T. Since e1 and e2 can not be measured directly, these quantities must be computed indirectly using the polar position data of the agent available from the ground tracker as
⎥⎦
Note that ||e|| represents the distance from the agent to the LOS. Therefore, the agent will eventually hit the target if the tracking error is driven to zero before the target crosses the agent. The three-dimensional CLOS guidance problem has been formulated a tracking problem. Define
.
Using the previous notations, (3-1), (3-2), and (3-4) can be put into the following state-space form:
⎥⎥
The objective of CLOS guidance control is to find a control law to drive the tracking error e(t) to zero. For the system shown in (3-5), define the vector fields Xj, j = 0, 1, 2 by
Direct computation yields
t
After manipulations, the tracking error in (3-5) can be shown concisely into the following
3.4 One-To-One Agent-Target Path Evolution using FNN
A new intelligent FNN controller to realize the single agent-target command line-of-sight (CLOS) guidance law will be discussed in this section. In comparison with the cerebellar model articulation controller (CMAC) structure in [46], the proposed FNN controller is with fewer mappings and layers and the enormous weight space and limited modeling capability in CMAC can be improved using FNN. The tracking error obtained in (3-8) can be further formed as the tracking error vector and be input to the input layer of FNN.
The output in the output layer of FNN is adopted as the main controller to the MAS to evolutes the positions of the winner targets to their corresponding desired targets. Assuming all the system dynamics are well known and that there exists an ideal controller for a single agent based on the feedback linearization control design, we then arrive from (3-8):
] Applying (3-9) into (3-8), the following error dynamics for a single agent can be given
1
are Hurwitz matrices by choosing proper k11, k21, k12 and k22. However, the ideal controller uid
is difficult to implement in practice since the system dynamics is highly nonlinear and sometimes unavailable. Therefore, in order to control the output state efficiently, the control law is assumed to take the following form:
m
fnn u
u
u= + (3-11) where ufnn is a FNN controller, and um is a monitoring controller. The FNN control ufnn is the main tracking controller used to imitate the ideal controller in (3-9), and the monitoring controller um is designed to recover the residual approximation error. The monitoring controller, which is similar to a hitting controller in a traditional sliding mode controller, is derived in the sense of Lyapunov theorem to cope with all system uncertainties to guarantee the stability of the system. The control input u in (3-11) is used for the input of agent in (3-8).
Figure 3-4 illustrates the concept of (3-11) in our new approach. The tracking error vector eS
and neural network output yo in Fig. 3-4 will later be defined as the input and output of the FNN controller, respectively. The limiter in Fig. 3-4 is the maneuvering limiter of the agent to perform a practical behavior for simulations.
um
o
fnn y
u =
S u e
FNN controller Monitoring
controller
+
+
Adaptive algorithm
Limiter
um
o
fnn y
u =
S u e
FNN controller Monitoring
controller
+
+
Adaptive algorithm
Limiter
Fig. 3-4. The configuration of ufnn and um for single agent.
The fully linked FNN architecture shown in Fig. 2-3 is also adopted in this section. Repeat
from (2-19) and (2-20), the FNN output can be presented as
∏
∑
= =−
−
=
= H
h
kh kh k P
p
kh kh k p po
o w x m v x m v
y
1
2 2
1
4 ζ ( , , ) exp( ( ) /( ) ) (3-22)
where x denotes the kk th input to the node of input layer for distinguishing the state variables of agent x in MAS. The above (3-22) represents the firing weight of the pth neuron in the rule layer. For simplicity, the following m and v vectors are defined to collect all parameters in the hidden layer of Fig. 2-3 given as
(3-23) m T
m m
m m
m ]
[ L L L K
=
m 11 K1 12 K2 1H KH
(3-24) Then, the output of the FNN can be represented in a vector form as
v T
v v
v v
v ]
[ L L L K
=
v 11 K1 12 K2 1H KH
) , , (x m v ζ
w
yo = eT (3-25)
where yo =yO4 , , and . By the
universal approxim ideal
T H e e
e
e =[w,1 w,2 L w, ]
w
ation theorem, there exists an
T H] [ζ1 ζ2 L ζ
= ζ
) , , ,
( * * *
* y x w m v
yo = o such that [40, 41]
E v m x ζ w E y
yo = o*+ = e*T ( , *, *)+
where E denotes the approximation error and we , m , and v are the optimal parameter vectors of we, m, and v, respectively. In fact, the optimal parameter vectors needed to best approximate a given nonlinear function are dif
(3-26)
* * *
ficult to determine. Thus, an estimate function is defined as
ˆ) ˆ, , ˆ ( ˆ) ˆ, ˆ , ,
ˆ y (x w m v w ζ x m v
yo = o e = eT (3-27) where wˆe, mˆ , and are the estimates of we*, m*, and v*, respectively. For notational convenience, we denote
vˆ
) , ,
( * *
* ζ x m v
ζ = and ζˆ=ζ(x,mˆ,vˆ). Then, we define the estimation error as
E on-line tune the parameters of the FNN to achieve favorable estimation. To achieve this goal, we use the linearization technique to transform the nonlinear Gaussian functions into partially linear form so that the Lyapunov theorem extension can be applied [40] as follows
ζ
and H is the higher-order term, and
∂m
∂ζh and
∂v
∂ζh
are defined respectively as
⎥⎦
Substituting (3-29) into (3-28) gives
d
that is d ≤ . The proposed control system is comprised of an FNN identifier and an Δ optimal controller defined in (3-11), in which ufnn is used to mimic the ideal controller uid, and the compensation tangent controller um is used to compensate for the difference between the FNN controller and the ideal controller. Considering a single agent in the state-space form, the tracking error vector defined as
T S [e1 e1 e2 e2]
e = & & (3-33) which represents the input vector fed into the input node of FNN controller. Substituting (3-11) into (3-8) and using (3-33), the error dynamic equation becomes
ˆ ) ˆ ~
ˆ ~ (~ )
( id fnn m S a eT T m e T v e m
a S
S Ke G u u u Ke G w ζ m ζ w v ζ w d u
e& = + − − = + + + + − (3-34)
where and . Since K is also a Hurwitz matrix,
given a symmetric positive-definite matrix , there exists a symmetric positive-definite matrix , such that the following Lyapunov equation [39, 42]
⎥⎦
⎢ ⎤
⎣
=⎡
×
× 2 2 2
2 2 1
K 0
0 K K
T
a G G
G
G ⎥
⎦
⎢ ⎤
⎣
=⎡
22 12
21 11
0 0
0 G 0
ℜ
∈ Q
2×2
2×2
ℜ
∈ P
Q PK P
KT + =− (3-35) is satisfied.
Theorem 3-1: Consider the nonlinear dynamic system represented by (3-1) with the control law in (3-11), where the FNN identifier is designed as (3-27). Then, the weighting vectors , , and will remain bounded, and the performance errors will approach zero. The parameters are updated by the following learning rules:
wˆe mˆ vˆ
ζ PG e w
wˆ&e = &−~e =ηw ST aˆ (3-36)
e m a T S
me PG ζ w
m
mˆ& = &−~ =η ˆ (3-37)
e v a T S
ve PG ζ w
v
vˆ& = &−~=η ˆ (3-38) )
(
tanh ST a
m Δ e PG
u = (3-39)
where ηw, ηm, and ηv are the positive real values. Then the stability of the FNN control system can be guaranteed.
Proof:
Let the Lyapunov-like function candidate be
v
Taking the derivative of V in (3-40) with respect to time and using (3-34) and (3-35), yields
)
Substituting the learning rules (3-36)–(3-39) into (3-41), (3-41) becomes
2 proven [39]. In addition, the right hand side of (3-43) is bounded, that is, . Using
L2 S∈ e
S ∈ e& L∞
Barbalat’s Lemma [39], we can prove that lim =0
∞
→ S
t e when
∫
0t dt<∞Δ 2 . The stability of the overall approximation scheme is guaranteed based on the above results and the Lyapunov stability theorem. Based on (3-36)–(3-38), the adaptive law of weighting factors in an element form can be obtained. Thus, the Lyapunov stability theorem is guaranteed under the optimal approximation model with no modeling error. Q.E.D.
3.5 The Design of SOM for Task Assignment of MDS
After the single agent-target control system is constructed, the overall MDS (or MAS) consists of (N + D) numbers of agent-target matches will be discussed in this section. Suppose that D≥N , it takes P(D,N)=D!/(D−N)! computation steps to find the total distances or damaging cost in traditional exhaustive method. In real-time MDS environment, this pre-computation before the targets are lunched is time-consuming. Therefore, the principal goal of SOM is to transform an input pattern of arbitrary dimension into a one- or two-dimensional discrete map as well as to perform this transformation adaptively in a topologically ordered fashion [19, 38]. The SOM is suitable for dealing with task assignment because the dimension of the targets can be simplified, and mapped to the corresponding agents. The overall MAS system can be considered a self-organizing system which can adjust its basic structure when its environment changes. The algorithm of the SOM proceeds first by initializing the synaptic weights in the network, such that it can be done by assigning them in random indexed patterns. Considering a multi-agent-multi-target scenario, the positions and angles of ith agent and dth target can be further defined as
A ai =[xa,i ya,i za,i x&a,i y&a,i z&a,i ψa,i θa,i]T ∈ ,
d
, T
td =[xt,d yt,d zt,d x&t y&t,d z&t,d ψt,d θt,d]T∈ , respectively. The proposed control inputs of the ith agent can also be defined as
U ui =[ayc,i azc,i]T =[u1,i u2,i]T∈
where ayc,i and azc,i are the yaw and pitch acceleration commands of the ith agent, respectively.
Define the positions of agents A={a1,a2,...,aN} where ai =[xa,i ya,i za,i]T is the position of the ith agent, and denote the random indexed input vectors chosen from the positions of targets as
} ,..., ,..., ,
{r1 r2 rd rD R=
where rd =[xt,i yt,i zt,i]T ∈T is the position of dth target. Once the positions of agents and targets are initialized, the competitive process of SOM can start to find the winner neurons. In traditional SOM, the winning neuron locates the center of a topological neighborhood of excited neurons. In this thesis, the neighborhood of the winner is neglected since the agents move toward their corresponding targets without any cooperative process. In the traditional competitive process, the total Euclidean distances and total damaging cost have to be considered. Therefore, we first assume that all the asset values are neglected, and the competitive mechanism will choose the Euclidean distance between the ith agent and the dth target defined as
i d d
Di, = r −a . (3-44) In traditional SOM, this Euclidean distance is the parameter for the competitive process.
However, the other parameters, like total Euclidean distances in [23], should also be considered with the distance expression. The values of assets in MDS is more important than their corresponding Euclidean distances and the motivation of SOM in this chapter is to minimize the total damaging cost, therefore, a new distance expression from (3-44) with the equitable distribution of workload can be defined as
d i d total d
i D
V D, V ⎟⎟⎠ ,
⎜⎜ ⎞
⎝
⎛ +
= + δ
δ (3-45)
where
∑
== D
d d
total V
V
1
is the total value of all the assets; δ is the adjustment parameter defined by the user to determine the importance of asset value. The smaller the δ is, the more important of asset value is. As shown in Fig. 3-5, the new distance expression constructed by the input of δ, ai, and rd forms a new N-by-D distance matrix
⎥⎥
⎥
⎦
⎤
⎢⎢
⎢
⎣
⎡
=
D N N
D
D D
D D
, 1
,
, 1 1
, 1
L M O M
L
D . (3-46)
For some given δ, agent, and target as input, the output neurons compete to be the winner according to a specified criterion described as
} } , { ; ,..., 2 , 1 ; ,..., 2 , 1 , min{
] ,
[iw it = Di,d i= N d = D i r ∉Ω (3-47) where [iw, it] denotes that the match in which the itth target from the iwth agent is the winner, and Ω is the set of neurons in which the winner has been chosen in an iteration. From (3-47), the N winners can be found to obtain a new W={w1,w2,...,wN} which is the re-allocated index of agents A that corresponds to the random indexed targets R to be used in the adaptive process.
⎥ ⎥
winner neuron
winner neuron
D
Fig. 3-5. The structure of self-organizing map (SOM).
Algorithm 3-1:
Step 1 N agents are created in A
D targets with random indexed are created in R D damaging costs are created in V
calculate
∑
=
= D
d d
total V
V
1
define the adjustment parameter δ Step 2 for agent ai,i =1,2,...,N in A
it is d, find the winner neuron with index iw
nd the index iw to the interception list L tep 4 agents are dispatched to hit the targets with orders in L
] ,
[iw it is obtained in the iwth row, dth column appe
end S
From the above two processes, the number of computation steps for finding the minimal total damaging cost can be obtained, which is equal to N∗D. In comparison with traditional SOM method, the new adaptive SOM method eliminates the time consuming tuning in neighborhood function and is able to reduce the computational load in the task assignment of MAS. Note that in this thesis, the hit probability of agent is assumed as 100 %. However, the SOM mechanism can also find the minimal total damaging cost even if the leakage of agents is considered. By arranging the winner agents W, the interception list L can be obtained which is ordered by the index of agent. The list is a useful command or decision for MDS to determine which target should be intercepted by which agent in the future. The last mechanism of SOM is the adaptive process which enables the winner agents W to update the
ositions of the winners.
le 3-1: SOM-based dispatching
agents p
Examp Step 1
Figure 3-6 shows four steps for SOM example in MDS in which there are the positions of A
a a a a
A={ 1, 2, 3, 4}⊂ } , , , , ,
(N = 4), and the positions of targets with random indexed (D = 6), and three surviving assets
{r1 r2 r3 r4 r5 r6
R= S={s1,s2,s3} with their values
1 ) (s1 =
V , V(s2)=2, and V(s3)=3. The damaging costs caused from the attacking targets in MDS can be formed as V={V1, 2, 3,V4,V5,V6}={1,3,1,2,3,2} which means t1 will attack s1, t2 will attack s3, t3 will attack
V V
s1, t4 will attack s2, etc. Before the beginning of attack of targets, the total damaging cost
12
total V
V can be obtained.
Step 2
In the SOM mechanism, the positions of agents A and random indexed targets R will first be used to calculate the Euclidean distance as
}
If δ is chosen as 0.1, the new distance matrix from (3-45) can be obtained for all the agents as:
.
Because we are focusing on the task assignment to minimize the damaging cost, we can assume that the distances between the agents and targets are the same and are normalized to one. This implies that
}
Therefore, the minimum for each agent can be found from the following new distance matrix:
⎥⎥ matching pair is {a1, t2}. Therefore, after repeating from the first row to the fourth row, the winner-target pairs {a1, t2}, {a2, t5}, {a3, t4}, and {a4, t6} can be obtained by using the
competitive process in (3-47) list.
Step 4
Picking the indexes of the targets in the matching pairs, the interception list
can further be constructed as a MDS command which shows that the t2, t5, t4, and t6 targets should be intercepted by the a1, a2, a3, and a4 agents, respectively. Defended by the agents, all the assets after this attacking wave have the remaining damaging costs
}
the final total damaging cost becomes 6 2
1
′=
′ =
∑
=
d d
total V
V which is the minimal value. In this example, although the Euclidean distances are almost neglected, the situation for the two or more assets have the same value and there exists relatively short Euclidean distance from some agent to its corresponding target should be taken into consider.
V which is the minimal value. In this example, although the Euclidean distances are almost neglected, the situation for the two or more assets have the same value and there exists relatively short Euclidean distance from some agent to its corresponding target should be taken into consider.