Thesis Organization - 以反透視映射模型與移動補償為基礎的障礙物偵測應用在倒車行為的安全上

Chapter 1 Introduction

1.4 Thesis Organization

The remainder of this thesis is organized as follows. Chapter 2 describes related

works about IPM and obstacle detection. Chapter 3 introduces system overview and some fundamental techniques such as IPM and optical flow. Chapter 4 shows obstacle detection algorithm including feature point extraction, ground movement estimation, obstacle localization, obstacle verification and distance measurement. Chapter 5 shows experimental results and evaluations. Finally, the conclusions of this system and future work will be presented in chapter 6.

Chapter 2. Related Works

2.1 Related Works of Inverse Perspective Mapping (IPM)

The objective of inverse perspective mapping method is to remove the perspective effect caused by camera. This effect will cause the far scene to be condensed and always make following processing to be confused.

The main team who researched about the application of inverse perspective mapping topic are Alberto Broggi’s team at the University of Parma in Italy. First, they proposed an inverse perspective mapping theory and establish the formulas [5].

Then they use these theories combine with some image processing algorithm, stereo camera vision system, and parallel processor for image checking and analysis (PA-PRICA) system which works in single Instruction multiple data (SIMD) computer architecture to form a complete general obstacle and lane detection system called GOLD system [6]. The GOLD system which was installed on the ARGO experimental vehicle is in order to achieve the goal of automatic driving.

There are other researchers using the inverse perspective mapping method [5] or similar mapping method combining with other image processing algorithm to detect lane or obstacles. For example, W. L. Ji [7] utilized inverse perspective mapping to get the 3D information such as the front vehicle height, distance, and lane curvature.

Cerri et al. [8] utilized stabilized sub-pixel precision IPM image and time correlation to estimate the free driving space on highways. Muad et al. [9] used inverse perspective mapping method to implement lane tracking and discussed the factors which influent IPM. Tan et al. [10] combined the inverse perspective mapping and

optic flow to detect the vehicle on the lateral blind spot. Jiang et al. [11] proposed the fast inverse perspective mapping algorithm and used it to detect lanes and obstacles.

Nieto et al. [12] proposed the method that stabilized inverse perspective mapping image by using vanish point estimation. Yang [13] adjusted the characteristic of inverse perspective mapping proposed by Broggi [5], which is the property a horizontal straight line in the image will be projected to an arc on the world surface.

However, a horizontal straight line in the image will be projected to a straight line on the world surface, and a vertical straight line in the image will also be projected to a straight line and the prolongation will pass the camera vertical projection point on the world surface.

2.2 Related Works of Obstacle Detection

The obstacle detection is the primary task for intelligent vehicle on the road, since the obstacle on the road can be approximately discriminated from pedestrian, vehicle, and other general obstacles such as trees, street lights and so on. The general obstacle could be defined as objects that obstruct the path or anything located on the road surface with significant height.

Depending on the number of sensors being used, there are two common approaches to obstacle detection by means of image processing: those that use a single camera for detection (monocular vision-based detection) and those that use two (or more) cameras for detection (stereo vision-based detection).

The stereo vision-based approach utilizes well known techniques for directly obtaining 3D depth information for objects seen by two or more video cameras from different viewpoints. Koller et al. [14] utilized disparities in correspondence to the

obstacles to detect obstacle and used Kalman filter to track obstacles. A method for pedestrian(obstacle) detection is presented in [15] whereby a system containing two stationary cameras. Obstacles are detected by eliminating the ground surface by transformation and matching the ground pixels in the images obtained from both cameras. The stereo vision-based approaches have the advantage of directly estimating the 3D coordinates of an image feature, this feature being anything from a point to a complex structure. The difference in viewpoint position causes a relative displacement, called disparity, of the corresponding features in the stereo images. The search for correspondences is a difficult, time-consuming task that is not free from the possibility of errors. Therefore, the performance of stereo methods depends on the accuracy of identification of correspondences in the two images. In other words, searching the homogeneous points pair in some area is the prime task of stereo methods.

The monocular vision-based approaches utilize techniques such as optical flow.

For optical flow based methods which indirectly compute the velocity field and detect obstacle by analyzing the difference between the expected and real velocity fields, Kruger et al. [16] combined optical flow with odometry data to detect obstacles.

However, optical flow based methods have drawback of high computational complexity and fail when the relative velocity between obstacles and detector are too small. Inverse perspective mapping (IPM), which is based on the assumption of moving on a flat road, has also been applied to obstacle detection in many literatures.

Ma et al. [17][18][19] present an automatic pedestrian detection algorithm based on IPM for self-guided vehicles. To remove the perspective effect by applying the acquisition parameters (camera position, orientation, focal length) on the assumption of a flat road geometry, and predicts new frames assuming that all image points lie on the road and that the distorted zones of the image correspond to obstacles. Bertozzi et

al. [20] develop a temporal IPM approach, by means of inertial sensors to know the speed and yaw of the vision system and the assumption of flat road. A temporal stereo match technique has been developed to detect obstacles in moving situation. Although these methods could utilize the property of IPM to obtain effective results, but all of these should rely on external sensors such as odometer or inertial sensor to acquire ego-vehicle’s displacement on the ground that enables them to compensate movement over time for the ground plane. Yang et al. [21] proposed a monocular vision-based approach by compensating for the ground movement between consecutive top-view images using the estimated ground-movement information and computes the difference image between the previous compensated top-view image and the current top-view image to find the obstacle regions. Therefore, we want to propose a pure vision-based algorithm only use a single camera and does not need additional sensor that could detect the target object for general environment.

Chapter 3 System Overview and Fundamental Techniques

3.1 System Overview

The overall system flowchart is shown in Fig. 3-1 and Fig. 3-2. At the beginning, we will utilize the road detection technique to support the feature point extraction.

Due to the characteristic of color space, we first transform the RGB images to Lab color space. The road detection algorithm is processed on Lab color space. Then, we analysis the features which are suitable for track in our condition, the road boundary and the features which gradient is satisfied the restriction in the road region is selected, and the detailed contents will be described in Section 4.1.

While we are obtaining the feature points, the optical flow will be calculated among all of the feature points. Due to the perspective effect, the direction and length of the optical flows on the road in the original image are not the same. Therefore, we transform the information include the image coordinate and optical flow of feature points from the image coordinate to the world coordinate by inverse perspective mapping(IPM) [13] and build the bird’s view image. When we obtain the information of feature points in world coordinate, the principal distribution of optical flow and the temporal coherence is used to estimate and verify the ground movement respectively.

By transforming coordinate between image coordinate and world coordinate and compensating the ground movement, we can build a compensated image which is shifted by ground movement. The ground movement compensation procedure will be shown in Section 4.2 in detail.

As depicted in Fig. 3-2, the obstacle localization procedure is done by image difference. By thresholding the image, we can obtain the obstacle candidate image which the planar object in the image will be eliminated and the non-planar object which is the obstacle regions will be marked. For each obstacle region the closest position to our ego-vehicle is the interesting. The objective of obstacle localization is to look for the closest position and we can warn the driver by showing the target position. The detail of the obstacle localization will be described in Section 4.3. To ensure the detection results whether the objects are obstacles or not, we utilize the road detection result to validate the initial results. When the objects are detected by above flows, the information about the distance between target and our ego-vehicle is estimated by distance measurement procedure. The obstacle validation method and distance measurement method will be described in Section 4.4 and Section 4.5 respectively.

Fig. 3-1 System Flowchart

13 Input

Image

Compensated Image

Difference Obstacle

Localization Verification

Obstacle Detection Result Obstacle Detection

Distance Measurement

Fig. 3-2 System Flowchart

As mentioned above, ground movement information is estimated from optical flow in the world coordinate system and used to compensation for the difference between consecutive frame images in the world coordinate. Therefore, the inverse perspective mapping (IPM) and optical flow techniques adopted in our system will be introduced in the next section 3.2 and 3.3 respectively.

3.2 Inverse Perspective Mapping (IPM)

The perspective effect associates different meanings to different image pixels, depending on their position in the image. Conversely, after the removal of the perspective effect, each pixel represents the same portion of the information content, allowing a homogeneous distribution of the information among all image pixels. In other words, a pixel in the lower part of the image represents less information, while a pixel in the middle of the same image involves more information.

To cope with the effect that non-homogenous information content distribution among all pixels, an inverse perspective mapping transformation method will be introduced to remove perspective effect. To remove the perspective effect, it is

necessary to know the specific acquisition conditions (camera position, orientation, optics, etc.) and the scene represented in the image (the road, which is now assumed to be flat). This constitutes the a priori knowledge. The procedure aimed to remove the perspective effect by resample the incoming image, remapping each pixel on the original camera’s image toward a different position and producing a new two-dimensional (2-D) array of pixels. The resulting image represents a top view of the scene, as it was observed from a significant height. Hence we will obtain a new image whose pixels indicate homogeneous information content.

In our research, the inverse perspective mapping [13], which will be able to remove the perspective effect by transforming the image coordinate to world coordinate and then process upon the world coordinate to estimate ground movement.

With some prior knowledge such as the flat road assumption and intrinsic and extrinsic parameters, we will be able to reconstruct a two-dimension image without perspective effect. The expectative results of diagrams are shown in Fig. 3-3. The transformation equation pair [13] with two expectative results: (1) a vertical straight line in the image will still be projected to a straight line whose prolongation will pass the camera vertical projection point on the world surface, (2) a horizontal straight line in the image will be projected to a straight line instead of an arc on the world surface.

This result can be verified by similar triangle theorem.

(a)

(b) (c)

Fig. 3-3: The expectative results of diagrams (a) perspective effect removing (b) the property of a vertical straight line (c) the property of a horizontal straight line

The spatial relationship between the world coordinate and image coordinate system is shown in Fig. 3-4, and the illustrations of deriving process shown in Fig.

3-5

ηx

ηy

η

(a) (b)

Fig. 3-4: (a) Bird’s view and (b) side view of the geometric relation between world coordinate and image coordinate system.

From Fig. 3-5, the following equations will be derived β β

-1 -m u 2

1 =

→

α α

-1 -n v 2

2 =

→ (3-1)

η

Fig. 3-5: Geometric relation of image coordinate system and world coordinate system The forward transformation equations will be derived, and the backward transformation equations are easily obtained by some mathematical computation. The

forward transformation and backward transformation equations are shown below:

Forward transformation equations: The notations in the above equations and figures are introduced as follows:

(u,v) : The image coordinate system.

(X,Y,Z) : The world coordinate system where (X,Y,0) represent the road surface.

(L,D,H) : The coordinate of camera in the world coordinate system.

θ: The camera’s tilt angle.

γ: The camera’s pan angle.

α, : The horizontal and vertical aperture angle.

m,n : The dimension of image (m by n image).

O: The optic axis vector.

y x,η

η : The vector which represents the optic axis vector O projected on the road surface and its perpendicular vector.

To implement the inverse perspective mapping, we use the equations (3-3) instead of equations (3-2) and scan row by row on the remapped image to compute

the mapping points on the original image while we do not want an image full of hollows. However, the objective of inverse perspective mapping in our research is not to build the bird’s view image, but aiming to use the transformation to remove the perspective effect and to transform coordinate between the original image coordinate and the world coordinate system. By utilizing inverse perspective mapping, the ground movement estimation procedure will process in the world coordinate system.

3.3 Optical Flow

When we are dealing with a video source, as opposed to individual still images, we may often want to assess motion between two frames (or a sequence of frames) without any other prior knowledge about the content of those frames. The optical flow itself is some displacement that represents the distance a pixel has moved between the previous frame and the current frame. Such a construction is usually referred to as a dense optical flow, which associates a velocity with every pixel in an image. The Horn-Schunck method [22] attempts to compute just such a velocity field. One seemingly straightforward method simply attempting to match windows around each pixel from one frame to the next; this is known as block matching. Both of these methods are often used in the dense tracking techniques.

In practice, calculating dense optical flow is not easy. Consider the motion of a white sheet of paper. Many of the white pixels in the previous frame will simply remain white in the next. Only the edges may change, and even then only those perpendicular to the direction of motion. The result is that dense methods must have some method of interpolating between points that are more easily tracked so as to solve for those points that are more ambiguous. These difficulties manifest themselves

most clearly in the high computational costs of dense optical flow.

This leads us to the alternative option, sparse optical flow. Algorithms of this nature rely on some means of specifying beforehand the subset of points that are to be tracked. If these points have certain desirable properties, such as the “corners”, then the tracking will be relatively robust and reliable. The computational cost of sparse tracking is so much less than dense tracking that many practical applications are often adopting. Therefore, we consider the most popular sparse optical flow technique, Lucas-Kanade (LK) optical flow [23][24], this method also has an implementation that works with image pyramids, allowing us to track faster motions.

The Lucas-Kanade (LK) algorithm, was applied to a subset of the points in the input image, it has become an important sparse technique. The LK algorithm can be applied in a sparse context because it relies only on local information that is derived from some small window surrounding each of the points of interest. The disadvantage of using small local windows in Lucas-Kanade is that large motions can move points outside of the local window and thus become impossible for the algorithm to find.

This problem led to development of the “pyramidal” LK algorithm, which tracks starting from highest level of an image pyramid (lowest detail) and working down to lower levels (finer detail). Tracking over image pyramids allows large motions to be caught by local windows.

The basic idea of the LK algorithm rests on three assumptions:

1. Brightness constancy. A pixel from the image of an object in the scene does not change in appearance as it (possibly) moves from frame to frame. For grayscale images, this means we assume that the brightness of a pixel does not change as it is tracked from frame to frame.

2. Temporal persistence. The image motion of a surface patch changes slowly in time.

In practice, this means the temporal increments are fast enough relative to the scale

of motion in the image that the object does not move much from frame to frame.

3. Spatial coherence. Neighboring points in a scene belong to the same surface, have similar motion, and project to nearby points on the image plane.

By using these assumptions, the following equation can be yielded, where y component of velocity is v and the x component of velocity is u:

x y t 0

I u+I v+ =I (3-4)

For this single equation there are two unknowns for any given pixel. This means that measurements at the single-pixel level cannot be used to obtain a unique solution for the two-dimensional motion at that point. Instead, we can only solve for the motion component that is perpendicular to the line described by the flow equation. Fig.

3-6 presents the mathematical and geometric details.

0 ^T

Fig. 3-6: Two-dimensional optical flow at a single pixel:optical flow at one pixel is underdetermined and so can yield at most motion, which is perpendicular to the line

described by the flow equation

Normal optical flow results from the aperture problem, which arises when you

have a small aperture or window in which to measure motion. When motion is detected with a small aperture, you often see only an edge, not a corner. But an edge alone is insufficient to determine exactly how (i.e., in what direction) the entire object is moving. To resolve the aperture problem, we turn to the last optical flow assumption for help. If a local patch of pixels moves coherently, then we can easily solve for the motion of the central pixel by using the surrounding pixels to set up a system of equations. For example, if we use a 5-by-5 window of brightness values, round the current pixel to compute its motion, we can then set up 25 equation as follows: To solve for this system, we set up a least-squares minimization of the equation, wherebymin Ad−b ² is solved in standard form as: (A A d^T ) =A b^T . From this relation we obtain our u and v motion components. Writing this out in more detail and the solution to this equation is yielded:

( ) 1

Chapter 4 Obstacle Detection Algorithm

4.1 Feature Point Extraction

4.1.1 Feature Analysis

There are many kinds of local features that one can track. Features which are used to estimate ground movement based on its motion, that is to find these features from one frame in a subsequent frame of the video stream. Obviously, if we pick a point on a large blank wall then it won’t be easy to find that same point in the next frame of a video. If all points on the wall are identical or even very similar, then we won’t have much luck tracking that point in subsequent frames. On the other hand, if

在文檔中以反透視映射模型與移動補償為基礎的障礙物偵測應用在倒車行為的安全上 (頁 17-0)