CHAPTER 2 FUNDAMENTAL CONCEPTS
2.1 Simultaneous Localization and Map Building (SLAM)
2.1.2 Map Models
Obviously, the SLAM techniques found in literature, strongly depends on the environment description adopted. Nonetheless, all the proposed SLAM solutions can be roughly categorized into three main approaches: grid-based, topological, and feature-based. The reader interested in the most common environment representations is referred to [Thrun et al., 2002b] for a comprehensive overview of the state of the art in robotic mapping.
2.1.2a Grid-Based Aapproach
The grid-based techniques represent the simplest approach to map building. They adopt probably the most intuitive environment representation, i.e. an occupancy grid.
Instead of using a set of lines or points to represent objects in the mapping environment.
The space is discretized into cells, labelled with a value denoting the probability for that portion of space to be occupied. From the sensor data, the robot creates a local grid, related to its current pose. A search through a set of previously acquired certainty grids is performed to find the best match. The configuration of the robot giving the highest
correlation is chosen as the robot pose estimate. Once the vehicle is localized, the local grid is merged into the global one, reducing its uncertainty. In [Thrun et al., 1998] a probabilistic grid of the environment is build, supposing that the robot occasionally observes a small number of landmarks (such as hallway intersections, corners, dead ends). The SLAM problem is posed as a maximum likelihood estimation problem, where both the landmark locations and the robot pose have to be estimated. By employing suitable probabilistic models for the robot motion and the sensory system, the likelihood of the robot and landmark positions is maximized using the Baum-Welch algorithm. This techniques are particularly appealing for their simplicity and easiness of implementation. A successful application of this approach to the autonomous navigation of a robot acting as a museum tour-guide, has been presented in [Thrun et al., 2000b].
The main drawback of the grid-based approach relies on its intrinsically high computational burden and storage requirement. Moreover, the effectiveness of those methods strongly depends on the accuracy of the sensory system’s model, since misinterpreted noisy data may lead to inconsistent map management.
2.1.2b Topological Approach
Driven by the observation that human beings are able to successfully navigate, in spite of the poor accuracy of their sensory systems and in presence of frequently incomplete knowledge, topological techniques adopt a qualitative description of the surrounding environment, rather than building a metrically accurate map. The environment is represented by a set of distinctive landmarks and places. In [Kuipers and Byun, 1991], a hierarchical description of the spatial environment, is proposed.
Distinctive places and paths are defined by their properties at the control level, and serve as the nodes and arcs of the topological model. During the exploration, place matching is done using global topological constraints as well as local metrical comparison, in order to discriminate new places from already visited ones. [Choset and Nagatani, 2001] presented a method that exploits the topology of the robot’s free space, embodied into a generalized Voronoi graph (GVG), which encodes some metric information about the robot’s environment as well. [Mataric, 1992] proposes a behavior-based control scheme, which constructs a topological map of the environment from sonar and compass data
In general, those methods are appropriate for navigation in simple environments and for creating models of human and animal cognition. However, for many operations, metrically accurate maps, that are globally referenced, are necessary in order to accomplish the goal of the mission.
2.1.2c Feature-Based Approach
In the feature-based approach, the environment is mapped in terms of a set of remarkable features (landmarks), whose positions have to be estimated, together with the robot pose. Feature-based maps can explicitly store a set of points, lines, features or etc. that may be perceived by a mobile robot, along with their position. The first attempts to phrase the SLAM problem in an estimation-theoretic framework can be found in the pioneering words by [Smith and Cheeseman, 1986] and [Durrant-Whyte, 1988], who posed the statistical basis of environment representation. At the same time, [Chatila and Laumond, 1985], [Moutarlier and Chatila, 1989] and [Ayache and Faugeras,
1989] started to address the problem of visual navigation via Kalman filtering. As noticed later [Smith et al., 1990], those trends of research had much in common, all showing that the estimation errors of landmark locations are necessarily correlated, since they are based on relative measurements taken from the same uncertain positions (the robot pose). As recently proved by [Dissanayake et al., 2001] for the linear case, in the limit, the landmark estimates are fully correlated. Therefore, maintaining the full state covariance matrix is of paramount importance, as far as map consistency is concerned. As a matter of fact, simply neglecting landmark correlation leads to overoptimistic estimates, as shown in [Uhlmann et al., 1997] and [Castellanos et al., 1997]. Unfortunately, when exploring large areas, densely populated of features, the computational burden required by the update of the full covariance matrix can rapidly become too cumbersome for real-time operations. For instance, if n landmarks have to be estimated, the storage requirements of the EKF grows as n2 and the computational complexity of the measurement update is O(n3), although it can be reduced to O(n2) exploiting the specific structure of the SLAM problem. The cost involved in the management of the full state covariance matrix circumscribes the application of such techniques to middle-sized environments (map-scaling problem).
Most recent works are aimed at finding suboptimal solutions, trade off between computational complexity and estimation accuracy. The Covariance Intersection method, proposed by [Uhlmann et al, 1997], is able to perform the map update in constant time, making worst-case assumptions about the correlation (i.e. full correlation) among the state variables, while preserving the consistency of the estimates. Unfortunately, this approach may lead to unnecessary conservative estimates, with slow rate of
convergence. A simple strategy for reducing the cost of a SLAM algorithm is to fix the maximum number of features to be mapped, as suggested in [Dissanayake et al., 2000a], and “forgetting” far landmarks. This choice is clearly efficient, but a remarkable amount of information is discarded. In fact, when a previously removed landmark is sensed again, its estimate must be reinitialized from scratch in order to preserve map consistency. Another active research topic involves the development of hierarchical techniques, which decompose large maps into smaller submaps.
2.2 Coordinate Frame Transformations
Coordinate frame transformation between 3-D orthogonal Cartesian coordinate frames can be represented by three components: rotations, translations, and scaling. The relationship between the inertial coordinate frame {i} and the body coordinate frame {b}, as shown in Figure 1 is described by the Helmert transformation given in equation 2.1 [Hofmann-Wellenhof et al., 1997].
Figure 1 Relationship between two coordinate frames
presents the coordinate of origin of expressed in
b i
[ ]
( ) ( )
1and are three-element vectors consisting of the 3-D point coordinates in the form x y z . The ratation matrix has the property of orthogonality
A 4-by-4 homogeneous matrix may be written:
Z
3 1 3 3
where the 4-by-4 matrix is defined as follow
(2.5)
The rotation matrix R is a 3x3 matrix, consisting of nine unique parameters.Each of these parameters is the cosine of the angle between axes of two coordinate’s rotation.
For attitude determination, the rotation matrix R is often referred to as the direction matrix. Direction cosine matrices can completely describe the orientation of one coordinate frame to another without singularities, unlike the Euler angle parameterization. Because the direction cosine matrix by itself does not contain trigonometric functions. Its computation speed and accuracy can be favorable. The main disadvantage of the direction cosine parameterization is that it consists of nine parameters with comparison to three parameters of Euler angle representation.
2.4 Euler Angles
Mathematical formulation show that a maximum of three successive rotations along the coordinate axes were necessary to rotate one coordinate frame into another
( )
coordinate frame. The rotation angles are referred as the Euler angles. Let (α,β,γ) be the given set of Euler angles. The rotation about the x, y, and z axes with anglesα, β, andγ can be described by the Rx(α), Ry(β), and Rz(γ), respectively. They are formally defined in Equations (2.6), (2.7), and (2.8).
Equations (2.6), (2.7), and (2.8) give the completely form of the direction cosine matrices for right-handed rotations about the x, y, and z axes, respectively.
The product of direction cosine matrices is an orthogonal direction cosine matrix. A single direction cosine matrix through a series of three matrix multiplications can represent the Euler angles. Euler angles are useful for describing the attitude of one coordinate frame with respect to another.
2.5 3-D to 2-D Perspective Projection
A perspective projection model, commonly referred as the pinhole model shown in Figure 2, is used when the lens center coincides with the origin of camera frame. The process of 3-D to 2-D projection relates the image points in image plane to those corresponding points in Cartesian coordinate. In order to realize the aforementioned
process, there are two steps that we have to accomplish. The first step is to transfer the landmark’s position to the camera coordinate. Next, the image coordinate of the projected landmark is found. If the camera coordinate of a landmark is (XC,YC,ZC) and the associated image coordinates is ( , ), then u v
C
By referring to the equations (2.9) and (2.10), the transformation is nonlinear and noninvertible. That is, the relations are unique for 3-D to 2-D, but the 3-D coordinates cannot be determined uniquely from the corresponding 2-D image points.
Focal Length
Figure 2 The perspective projection for imaging geometry
2.6 Transformation between Coordinate Frames and Camera Coordinate
The relationship between two orthogonal Cartesian coordinates may be formulated by homogeneous transformation matrix. The coordinate transformation from the world /inertial, i.e., {w}, to the camera coordinate (located at the optical center of the lens){C}
can be represented by two consecutive matrix multiplications, that is, wTbbTC , where
wTb is the transformation matrix between {w} and body frame {b}. Herein, the Laser-targeted wall point Pw = [Xw Yw Zw]T in {w} can be mapped to camera coordinate transformation between body and camera coordinate, respectively. Basically, each homogeneous transformation matrix T consists of two portions, namely, rotation and translation. The matrices wTb and bTC can be written as
11 12 13 0
Note that the d0 may be obtained from the prior knowledge of robot’s initial pose, or maybe identified if it not given.
2.7 The SLAM System Description
The schematics of the proposed SLAM system is illustrated in Figure 3 and Figure 4.
The information for l is provided by LRF, and is defined as the distance between LRF and its targeted wall point. The 3-D wall point with respect to world frame is
C
The key idea is to transform the imaged wall point expressed in {C} into the body coordinate and ultimately to the world coordinate. A wall point expressed in {C} can be formulated as
( , ) ( , 90 ) ( , ) ( ,90 )
Figure 3 The schematics of robot and its environment
Figure 4 The relationship between camera and LRF
CHAPTER 3
ATTITUDE AND LOCATION ESTIMATION METHOD
A number of approaches have been proposed to address the SLAM problem. One solution is Bayesian estimate approach that does not assume Gaussian probability distributions. Another solution is the Kalman filter that is the most common approach [Dissanayake et al., 2001]. Kalman filter gives a robust, optimal, and recursive state estimation to fuse redundant information from different sensors, assuming Gaussian noise on both system and measurement processes. The Extended Kalman Filter (EKF) was proposed by Smith, Self, and Cheesman [smith et al., 1990] who developed an EKF approach for building a stochastic map of spatial relationships. It is the most popular algorithm of linear Kalman filter and the linearization techniques for nonlinear system.
However, In recently years, the Unscented Kalman Filter (UKF) has been proposed and adopted as an alternative to the EKF. Both EKF and UKF algorithms deal with nonlinear dynamic equations, but the linearization technique for UKF algorithm is unnecessary.
3.1 Extended Kalman Filter
The extended Kalma filter (EKF) has been widely used for state estimation of nonlinear stochastic systems. The technique of the EKF uses the linearization of the nonlinear functions by truncating the second and higher order of derivatives in the Taylor series expansion. In the EKF, the state transition and observation models need
( )
not be linear functions of the state, but may instead be (differentiable) functions. The EKF consists of two parts which are state equation and measurement equation.
( ) ( ( ) ) ( ) ( ) ( )
where the function f is a nonlinear function of the state and w is a zero mean Gaussian noise having spectral density matrix Q. h is a nonlinear function of the measurement and v is a zero Gaussian noise having spectral density matrix R.
To use the Kalman filter to a nonlinear system, it must be linearized [Brown and Hwang, 1992]. At each time step the Jacobian is evaluated with current predicted state.
These matrices can be used in the Kalman filter equations. This process essentially linearizes the nonlinear function around the current estimate. The EKF algorithm consists of two updates which are time update and measurement update.
The and are prior state estimation and posterior state estimation in time index k+1, respectively. The
ˆ ( )k 1
x + − xˆ ( )k+1 +
1( )
Pk+ − and Pk+1( )+ are prior error covariance
matrix and posterior error covariance matrix in time index k+1, respectively.
The linearizations of system and measurement equations are
ˆ ( )
The Unscented Transformation (UT) is relatively a new method for predicting means and covariances in nonlinear system [Julier et al., 1995]. It represents random variables with a set of sigma points are deterministically chosen using the prior means and covariances, and propagates through system dynamics.
Consider a n-dimensional random variable x with mean x− and covariance Px, propagating though an arbitrary nonlinear functiony= f x( ). The unscented transform creates 2n+1 sigma vectorsχand weights W, given by
( )
points around x and is usually set to a small positive value. κ is a secondly scaling parameter which is usually set to 0, and β is used to incorporate priori knowledge of the distribution of x. ( (n+λ) )Px i is the th row (or column) of the matrix square root.
These sigma vectors are propagated through the nonlinear function to yield a set of transformed sigma points,
i
ζi = f( ), χi i=0,..., 2 (3.11) n And the mean and covariance for y are approximated using a weighted sample mean and covariance of the transformed sigma points as follows:
2 ( )
Figure 5 illustrates the steps in performing the UT.
Figure 5 Bloke diagram of the UT [Haykin, 2001]
A simple comparative example of the sigma-point approach is shown in Figure 6
[Seyoon et al., 2002] for a 2-dimensional system: the plot (a) shows the true mean and covariance propagation using Monte Carlo sampling; the plot (b) show the performance of the UT (note that only 5 sigma-points are needed for the 2D case); and the plot (c) show the results using a linearization approach as would be done in the EKF. The superior performance of the sigma-point approach is clearly evident.
(a) Actual sampling (b) UT (c) First-order linearization Figure 6 Mean and covariance propagations: (a) The true mean and covariance, (b) UT (5 sigma points for 2D cases), and (c) EKF’s linearization approach [Haykin, 2001]
3.3 The Unscented Kalman Filter
The unscented Kalman filter (UKF) was first proposed by Juiler to address nonlinear state estimation in the context of control theory [Juiler et al., 1995]. The
algorithm uses a set of carefully chosen sample points to capture mean and covariance of the system. The UKF bypasses the linearization issues of the EKF. Instead of approximating a random process by linearizing the nonlinear functions in the system, UKF approximates the distribution of the random state variable by the deterministically picked sigma points to more accurately capture the mean and covariance of the nonlinear transformation of the random variables. This scheme can also be treated as a statistical linear regression. The UKF is an extension of UT to the Kalman filter framework [Van Der Merwe and Wan, 2001; Tak et al., 2002], and it uses the UT to implement the transformations for both nonlinear process (time update or prediction) and sensor models (measurement update or correction).
The standard UKF algorithm is shown as follows:
1. Initialize the state mean and covariance matrix:
0 0 0 0 0 0 0
4. Measurement update equations (Correction):
3.4 Pose Estimation Using EKF and UKF
The pose estimation method is based on the EKF and UKF algorithms. The EKF is derived from a stochastic state-space representation of the mobile robot model and of the measured process. This technique uses some statistical relation to link the detected features positions from the uncertain sensor information and a simple mobile robot motion model in order to form an optimal estimate of the mobile robot’s pose. The EKF is applicable to this problem since the dynamics is highly nonlinear as well as its measurement function.
Furthermore, instead of truncating nonlinear dynamics to first order as the EKF’s,
Rear wheel drive car model:
the UKF approximates the distribution of the state with a finite set of points. Since the nonlinear dynamics are used without approximation, it is much simpler to implement and has better results; its performance has been analytically shown to be similar to a truncated second order EKF without the need to calculate Jacobians or Hessians of the dynamics [Juiler et al., 1995]. Both states and system parameters can be estimated in the UKF and EKF algorithm. Then the EKF and UKF equations for kinematics equations of attitude estimation are derived.
3.4.1 State Assignment
The typical kinematics model of a mobile robot can be obtained from Figure 7. The steering control θ and speed νare used with the kinematics model to predict the position of the mobile robot. And lL is the distance between wheel axels.
(3.17)
Figure 7 Mobile robot coordinate system
The state assignment consists of mobile robot position and orientation (θ)
[ ]T
x = x y z θ (3.19)
3.4.2 System (Dynamic) Model
The discrete time dynamic model can be written as
xk+1= f x( )k +Wk (3.20) Wk is the process noise and assumed to be additive Gaussian noise with zero mean and covariance Q. the system model is following
1
where τ is the sampling period; ν is the velocity of robot; ω is the angular velocity of
robot. The linearization of equation (3.21) leads to The measurement noise vk is a zero mean, Gaussian noise with covariance matrix R . k The nonlinear function comprises the 3-D transformation and perspective projection functions. Consequently, the measurement model follows as
(
k vector Pw. The linearization of the nonlinear function h results in( )
11 13 1411
The required elements of the EKF are given above. An initial estimate x is 0 required based on prior knowledge. An initial error covariance matrix is dependent on the prior knowledge, and must be specified. Process and measurement noise covariance matrices
P0
Q and K R must be also specified. As iteration proceeds, the k derived partial derivatives with respect to each state variable are calculated to determine the linearization equations for the EKF.
CHAPTER 4
ENVIRONMENT MAP REPRESENTATION
Environment maps, especially structured indoor one, are extensively used in mobile robots navigation for the tasks like map-based positioning or global path planning.
Building an accurate environmental map for a mobile robot is important and essential.
There are basically two types of maps suitable for mobile robot.
The occupancy grids represent the environment as a two-dimensional array of cells, where each cell holds a value which indicates the probability of being occupied.
Occupancy grid map is easy to build and useful for environment modeling even with imprecise range sensor like ultrasonic rangefinders. But, it is hard to be used directly due to its inaccuracy.
The geometric primitive based map has been explored by many researchers [Gonzalea et al., 1994]. Line segment is the most widely used feature. Since not every object provides line segments, these maps cannot give a closed and connected region like the occupancy grid maps. A map represented by a set of geometric primitives is enough to describe most of structured indoor environments. It extracts geometric primitives from sampling points and represents them with detail geometric primitives. If line segments in environmental map are arranged in order, navigation algorithm based on map will be more efficient and more time-saving.
4.1 Extraction of Line Segments
To represent environment with line segments, some linear fitting technologies are used to extract line segment from a set of sampling points that have been transformed into Cartesian coordinate system. To extract line segments from range points, we use recursive line splitting method [Matsuo et al., 1999; Xu et al., 2002], which is reliable and fast.
4.1.1 Clustering
All the points in each scan can be classified into different clustered regions by selecting threshold value appropriately. Distance d between two successively points are calculated. Thus each region consists of a set of points from start point to end point. Because the density of sampling points is different at different position, adaptive
All the points in each scan can be classified into different clustered regions by selecting threshold value appropriately. Distance d between two successively points are calculated. Thus each region consists of a set of points from start point to end point. Because the density of sampling points is different at different position, adaptive