** Foundations**

**2.2. Simultaneous Localization and Mapping**

**Figure 2.4.** The tail-to-tail relationship.

**The Tail-to-Tail Relationship**

For local navigation or obstacle avoidance, it is more straightforward to use the
lo-cations of moving objects in the robot frame than the lolo-cations with respect to the world
**coordinate system. In the example of Figure 2.4, given the locations of the robot x***ij* and a
**moving object x***ik*in the world frame, we want to know the location of this moving object,
**x*** _{jk}*, and its distribution, Σ

_{x}*, in the robot frame, which can be calculated recursively by:*

_{jk}**x***jk*

*4***= ⊕(ª(x***ij***), x***ik***) = ⊕(x***ji***, x***ik*) (2.8)
*This relationship is called the tail-to-tail relationship in (Smith et al., 1990). The *
first-order estimate of the mean of this tail-to-tail operation is:

*µ***x***jk**≈ ⊕(ª(µ***x***ij**), µ***x***ik*) (2.9)

and the first-order covariance estimate can be computed in a similar way:

Σ**x***jk* *≈ ∇**⊕*

· Σ**x***ji* Σ**x***ji***x***jk*

Σ**x***jk***x***ji* Σ**x***jk*

¸

*∇*^{T}_{⊕}*≈ ∇**⊕*

· *∇**ª*Σ**x***ij**∇*^{T}* _{ª}* Σ

**x**

*ij*

**x**

*jk*

*∇*

^{T}

_{ª}*∇**ª*Σ**x***jk***x***ij* Σ**x***jk*

¸

*∇*^{T}* _{⊕}* (2.10)
Note that this tail-to-tail operation is often used in data association and moving object
tracking.

**Unscented Transform**

As addressed above, these spatial uncertain relationships are non-linear functions and
are approximated by their first-order Taylor expansion for estimating the means and the
covariances of their outputs. In the cases that the function is not approximately linear in
*the likely region of its inputs or the Jacobian of the function is unavailable, the unscented*
*transform (Julier, 1999) can be used to improve the estimate accuracy. (Wan and van der*
Merwe, 2000) shows an example of using the unscented transform technique.

**2.2. Simultaneous Localization and Mapping**

In this section, we address the formulation, calculation procedures, computational complexity and practical issues of the SLAM problem.

**Formulation of SLAM**

The general formula for the SLAM problem can be formalized in the probabilistic form as:

*p(x**k**, M | u*1*, u*2*, . . . u**k**, z*0*, z*1*, . . . z**k*) (2.11)
*where x**k**is the true pose of the robot at time k, u**k*is the measurement from motion sensors
*such as odomtrey and inertial sensors at time k, z**k* is the measurement from perception
*sensors such as laser scanner and camera at time k, and M is stochastic stationary object*
*map which contains l landmarks, m*^{1}*, m*^{2}*, . . . m** ^{l}*. In addition, we define the following set

*to refer data leading up to time k:*

*Z**k* *4*

*= {z*0*, z*1*, . . . , z**k**}* (2.12)

*U**k*

*= {u**4* 1*, u*2*, . . . , u**k**}* (2.13)

Therefore, equation (2.11) can be rewritten as:

*p(x**k**, M | U**k**, Z**k*) (2.14)

Using Bayes’ rule and assumptions that the vehicle motion model is Markov and the environment is static, the general recursive Bayesian formula for SLAM can be derived and expressed as: (See (Thrun, 2002; Majumder et al., 2002) for more details.)

*p(x**k**, M | Z**k**, U**k*)

| {z }

*Posterior at k*

*∝ p(z**k* *| x**k**, M )*

| {z }

Update Z

*p(x**k* *| x**k−1**, u**k**) p(x**k−1**, M | Z**k−1**, U**k−1*)

| {z }

*Posterior at k − 1*

*dx**k−1*

| {z }

Prediction

(2.15)

*where p(x**k−1**, M | Z**k−1**, U**k−1**) is the posterior probability at time k −1, p(x**k**, M | Z**k**, U**k*) is
*the posterior probability at time k, p(x**k* *| x**k−1**, u**k**) is the motion model, and p(z**k* *| x**k**, M )*
is the update stage which can be inferred as the perception model.

**Calculation Procedures**

Equation 2.15 only explains the computation procedures in each time step but does not address the dependency structure of the SLAM problem. Figure 2.5 shows a Dynamic Bayesian Network of the SLAM problem of duration three, which can be used to visualize the dependencies between the robot and stationary objects in the SLAM problem. In this section, we describe the Kalman filter-based solution of Equation 2.15 with visualization aid from Dynamic Bayesian Networks (Paskin, 2003). The EKF-based framework described

2.2 SIMULTANEOUS LOCALIZATION AND MAPPING

in this section is identical to that used in (Smith and Cheeseman, 1986; Smith et al., 1990;

Leonard and Durrant-Whyte, 1991).

**Figure 2.5.** A Dynamic Bayesian Network (DBN) of the SLAM problem of duration
three. It shows the dependencies among the motion measurements, the robot, the
perception measurements and the stationary objects. In this example, there are
*two stationary objects, m*^{1}*and m*^{2}. Clear circles denote hidden continuous nodes
and shaded circles denote observed continuous nodes. The edges from stationary
objects to measurements are determined by data association. We will walk through
this in the next pages.

**Stage 1: Initialization.** Figure 2.6 shows the initialization stage, or adding new
stationary objects stage. Although the distributions are shown by ellipses in these figures,
the Bayesian formula does not assume that the estimations are Gaussian distributions. In
*this example, two new stationary objects are detected and added to the map. The state x*^{S}* _{k}*
of the whole system now is:

*x*^{S}* _{k}* =

*x**k*

*m*^{1}
*m*^{2}

(2.16)

*Let the perception model, p(z**k**| x**k**, M ), be described as:*

*z**k* *= h(x**k**, M ) + w**k* (2.17)

*where h is the vector-valued perception model and w**k* *∼ N (0, R**k*) is the perception error,
*an uncorrelated zero-mean Gaussian noise sequence with covariance, R**k**. Because the z**k*

*are the locations of the stationary objects M with respect to the robot coordinate system, the*
*perception model h is simply the tail-to-tail relationship of the robot and the map. Let the*
perception sensor return the mean location, ˆ*z*_{0}^{1}*, and variance, R*^{1}_{0}, of the stationary object
*m*^{1}and ˆ*z*_{0}^{2}*and R*^{2}_{0}*of m*^{2}. To add these measurements to the map, these measurements are
*compounded with the robot state estimate and its distribution because these measurements*

**Figure 2.6.** The initialization
stage of SLAM. Solid squares
denote stationary objects and
black solid circle denotes the
robot. Distributions are shown
by ellipses.

**Figure 2.7.** A DBN
represent-ing the initialization stage of
SLAM. After this stage, the
undirected graphical model is
produced in which two
station-ary objects and the robot state
are directly dependent.

are with respect to the robot coordinate system. Therefore, the mean and covariance of the whole system can be computed as in:

*µ*_{x}^{S}

0 =

*µ**x*0

*⊕(µ**x*0*, ˆz*_{0}^{1})

*⊕(µ**x*0*, ˆz*_{0}^{2})

(2.18)

Σ_{x}^{S}

0 =

Σ*x*0*x*0 Σ*x*0*m*^{1} Σ*x*0*m*^{2}

Σ^{T}_{x}

0*m*^{1} Σ*m*^{1}*m*^{1} Σ*m*^{1}*m*^{2}

Σ^{T}_{x}_{0}* _{m}*2 Σ

^{T}*1*

_{m}*m*

^{2}Σ

*m*

^{2}

*m*

^{2}

=

Σ*x*0*x*0 Σ*x*0*x*0*∇*^{T}* _{1⊕}* Σ

*x*0

*x*0

*∇*

^{T}

_{1⊕}*∇**1⊕*Σ*x*0*x*0 *∇**1⊕*Σ*x*0*x*0*∇*^{T}_{1⊕}*+ ∇**2⊕**R*^{1}_{0}*∇*^{T}_{2⊕}**0**

*∇**1⊕*Σ*x*0*x*0 **0** *∇**1⊕*Σ*x*0*x*0*∇*^{T}_{1⊕}*+ ∇**2⊕**R*^{2}_{0}*∇*^{T}_{2⊕}

(2.19)

*This stage is shown as p(x**k−1**, M | Z**k−1**, U**k−1*) in equation (2.15). Figure 2.7 shows
a DBN representing the initialization stage, or the adding new stationary objects stage, in
which the undirected graphical model is produced by moralizing^{2} the directed graphical
model. The observed nodes are eliminated to produce the final graphical model which
shows that two stationary objects and the robot state are directly dependent.

**Stage 2: Predication.** In Figure 2.8, the robot moves and gets a motion measurement
*u*1 *from odometry or inertial sensors. Let the robot motion model, p(x**k* *| x**k−1**, u**k*), be

2*In the Graphical Model literature, moralizing means adding links between unmarried parents who share a*
common child.

2.2 SIMULTANEOUS LOCALIZATION AND MAPPING

described as:

*x**k* *= f (x**k−1**, u**k**) + v**k* (2.20)

*where f (.) is the vector of non-linear state transition functions and v**k* is the motion noise,
*an uncorrelated zero-mean Gaussian noise sequence with covariance, Q**k*. Assuming that
*the relative motion in the robot frame is given by u**k*, clearly the new location of the robot
*is the compounding relationship of the robot pose x**k−1* *and u**k*. Because only the robot
moves, only the elements of the mean and the covariance matrix that corresponding to
*x**k*must be computed. In this example, the mean and the covariance matrix of the whole
system can be computed as:

*µ*_{x}*S*
1 =

*⊕(µ**x*0*, u*1)
*µ*_{m}^{1}
*µ*_{m}^{2}

(2.21)

and

Σ_{x}^{S}

1 =

Σ*x*1*x*1 Σ*x*1*m*^{1} Σ*x*1*m*^{2}

Σ^{T}_{x}_{1}* _{m}*1 Σ

*m*

^{1}

*m*

^{1}Σ

*m*

^{1}

*m*

^{2}

Σ^{T}_{x}_{1}* _{m}*2 Σ

^{T}*1*

_{m}*m*

^{2}Σ

*m*

^{2}

*m*

^{2}

=

*∇**1⊕*Σ*x*0*x*0*∇*^{T}_{1⊕}*+ ∇**2⊕**Q*1*∇*^{T}_{2⊕}*∇**1⊕*Σ_{x}_{0}_{m}^{1} *∇**1⊕*Σ_{x}_{0}_{m}^{2}
Σ^{T}_{x}

0*m*^{1}*∇*^{T}* _{1⊕}* Σ

_{m}^{1}

_{m}^{1}Σ

_{m}^{1}

_{m}^{2}

Σ^{T}_{x}

0*m*^{2}*∇*^{T}* _{1⊕}* Σ

^{T}*1*

_{m}*m*

^{2}Σ

_{m}^{2}

_{m}^{2}

(2.22)

**Figure 2.8.** The prediction stage
of SLAM.

**Figure 2.9.** A DBN representing
the prediction stage of SLAM.

This is the prediction stage of the SLAM problem which is shown asR

*p(x**k**| x**k−1**, u**k*)
*p(x*_{k−1}*, M | Z*_{k−1}*, U*_{k−1}*)dx** _{k−1}*in equation (2.15). Figure 2.9 shows a DBN representing the

*prediction stage of the SLAM problem. The new nodes, x*1

*and u*1, are added to the graph-ical model from the initialization stage. After moralizing the directed graphgraph-ical model,

*eliminating the odometry node u*1

*and eliminating the node x*0, the resulting undirected

graphical model is produced in which two stationary objects and the robot state are still directly dependent.

**Stage 3: Data Association.** Figure 2.10 shows that the robot gets new
*measure-ments, z*^{1}_{1}*and z*^{2}_{1}*, at the new location x*1*and associates z*_{1}^{1}*and z*^{2}_{1}with the stationary object
map. This is the data association stage of the SLAM problem. Gating is one of the data
*association techniques for determining whether a measurement z originates from some*
*landmark m. More details about data association will be addressed in Chapter 5.*

**Figure 2.10.** The data
associa-tion stage of SLAM. Irregular
stars denote new
measure-ments.

**Figure 2.11.** A DBN
represent-ing the data association stage
of SLAM.

Figure 2.11 shows a DBN representing the data association stage. The new perception
*measurement nodes, z*^{1}_{1}*and z*_{1}^{2}, are added to the graphical model from the prediction stage.

After data association, two directed edges are added to connect new measurements with the stationary object map.

**Stage 4: Update.** Figure 2.12 shows the update stage of the SLAM problem. Let
the perception sensor return the mean location, ˆ*z*_{1}^{1}*, and variance, R*^{1}_{1}, of the stationary
*object m*^{1} and ˆ*z*_{1}^{2} *and R*^{2}_{1} *of m*^{2}. These constraints are used to update the estimate and
the corresponding distribution of the whole system with Kalman filtering or other filtering
techniques.

An innovation and its corresponding innovation covariance matrix are calculated by:

*ν*1**= z**1* − ˆz*1 (2.23)

Σ*ν*1*= ∇**h*Σ_{x}*S*

1*∇*^{T}* _{h}* + Σ

*R*1 (2.24)

2.2 SIMULTANEOUS LOCALIZATION AND MAPPING

**Figure 2.12.** The update stage of SLAM **Figure 2.13.** A DBN
represent-ing the update stage of SLAM

**where z**1**and ˆz**1are computed by the compounding operation:

**z**1=

· *⊕(ª(µ**x*1*), µ*_{m}^{1})

*⊕(ª(µ**x*1*), µ*_{m}^{2})

¸

(2.25)

**ˆz**_{1}=

· *z*ˆ_{1}^{1}
ˆ
*z*_{1}^{2}

¸

(2.26)
*and ∇**h**is the Jacobian of h taken at µ**x*1. Then the state estimate and its corresponding state
estimate covariance are updated according to:

**x**^{S}_{1} **= x**^{S}_{1} *+ K*1*ν*1 (2.27)

Σ_{x}^{S}

1 = Σ_{x}^{S}

1 *− K*1*∇**h*Σ_{x}^{S}

1 (2.28)

where the gain matrix is given by:

*K*1= Σ_{x}^{S}

1*∇*^{T}* _{h}*Σ

^{−1}

_{ν}_{1}(2.29)

*This is the update stage of the SLAM problem which is shown as p(z**k**| x**k**, M ) in *
equa-tion (2.15). Figure 2.13 shows a DBN representing the update stage of the SLAM problem.

*After the update stage, the robot and two stationary objects are fully correlated.*

**Computational Complexity**

The Kalman filter solution of the SLAM problem is elegant, but a key bottleneck is its
computational complexity. Because it explicitly represents correlations of all pairs among
the robot and stationary objects, the size of the covariance matrix of the whole system
*grows as O(l*^{2}*), given that the number of stationary objects is l. The time complexity of*
*the standard EKF operation in the update stage is also O(l*^{2}). This computational burden
restricts applications to those in which the map can have no more than a few hundred

stationary objects. The only way to avoid this quadratically increasing computational
*re-quirement is to develop suboptimal and approximate techniques. Recently, this problem has*
been subject to intense research. Approaches using approximate inference, using exact
in-ference on tractable approximations of the true model, and using approximate inin-ference
on an approximate model have been proposed. These approaches include:

*•* Thin junction tree filters (Paskin, 2003).

*•* Sparse extended information filters (Thrun et al., 2002; Thrun and Liu, 2003).

*•* *Submap-based approaches: the Atlas framework (Bosse et al., 2003), compressed*
filter (Guivant and Nebot, 2001) and Decoupled Stochastic Mapping (Leonard
and Feder, 1999).

*•* Rao-Blackwellised particle filters (Montemerlo, 2003).

This topic is beyond the scope intended by this dissertation. (Paskin, 2003) includes an excellent comparison of these techniques.

**Perception Modelling and Data Association**

Besides the computational complexity issue, the problems of perception modelling and data association have to be solved in order to accomplish city-sized SLAM. For in-stance, the described feature-based formulas may not be feasible because extracting fea-tures robustly is very difficult in outdoor, urban environments. Data association is difficult in practice because of featureless areas, occlusion, etc. We will address perception mod-elling in Chapter 3 and data association in Chapter 5.