Optical Flow - System Overview and Fundamental Techniques

Chapter 3 System Overview and Fundamental Techniques

3.3 Optical Flow

When we are dealing with a video source, as opposed to individual still images, we may often want to assess motion between two frames (or a sequence of frames) without any other prior knowledge about the content of those frames. The optical flow itself is some displacement that represents the distance a pixel has moved between the previous frame and the current frame. Such a construction is usually referred to as a dense optical flow, which associates a velocity with every pixel in an image. The Horn-Schunck method [22] attempts to compute just such a velocity field. One seemingly straightforward method simply attempting to match windows around each pixel from one frame to the next; this is known as block matching. Both of these methods are often used in the dense tracking techniques.

In practice, calculating dense optical flow is not easy. Consider the motion of a white sheet of paper. Many of the white pixels in the previous frame will simply remain white in the next. Only the edges may change, and even then only those perpendicular to the direction of motion. The result is that dense methods must have some method of interpolating between points that are more easily tracked so as to solve for those points that are more ambiguous. These difficulties manifest themselves

most clearly in the high computational costs of dense optical flow.

This leads us to the alternative option, sparse optical flow. Algorithms of this nature rely on some means of specifying beforehand the subset of points that are to be tracked. If these points have certain desirable properties, such as the “corners”, then the tracking will be relatively robust and reliable. The computational cost of sparse tracking is so much less than dense tracking that many practical applications are often adopting. Therefore, we consider the most popular sparse optical flow technique, Lucas-Kanade (LK) optical flow [23][24], this method also has an implementation that works with image pyramids, allowing us to track faster motions.

The Lucas-Kanade (LK) algorithm, was applied to a subset of the points in the input image, it has become an important sparse technique. The LK algorithm can be applied in a sparse context because it relies only on local information that is derived from some small window surrounding each of the points of interest. The disadvantage of using small local windows in Lucas-Kanade is that large motions can move points outside of the local window and thus become impossible for the algorithm to find.

This problem led to development of the “pyramidal” LK algorithm, which tracks starting from highest level of an image pyramid (lowest detail) and working down to lower levels (finer detail). Tracking over image pyramids allows large motions to be caught by local windows.

The basic idea of the LK algorithm rests on three assumptions:

1. Brightness constancy. A pixel from the image of an object in the scene does not change in appearance as it (possibly) moves from frame to frame. For grayscale images, this means we assume that the brightness of a pixel does not change as it is tracked from frame to frame.

2. Temporal persistence. The image motion of a surface patch changes slowly in time.

In practice, this means the temporal increments are fast enough relative to the scale

of motion in the image that the object does not move much from frame to frame.

3. Spatial coherence. Neighboring points in a scene belong to the same surface, have similar motion, and project to nearby points on the image plane.

By using these assumptions, the following equation can be yielded, where y component of velocity is v and the x component of velocity is u:

x y t 0

I u+I v+ =I (3-4)

For this single equation there are two unknowns for any given pixel. This means that measurements at the single-pixel level cannot be used to obtain a unique solution for the two-dimensional motion at that point. Instead, we can only solve for the motion component that is perpendicular to the line described by the flow equation. Fig.

3-6 presents the mathematical and geometric details.

0 ^T

Fig. 3-6: Two-dimensional optical flow at a single pixel:optical flow at one pixel is underdetermined and so can yield at most motion, which is perpendicular to the line

described by the flow equation

Normal optical flow results from the aperture problem, which arises when you

have a small aperture or window in which to measure motion. When motion is detected with a small aperture, you often see only an edge, not a corner. But an edge alone is insufficient to determine exactly how (i.e., in what direction) the entire object is moving. To resolve the aperture problem, we turn to the last optical flow assumption for help. If a local patch of pixels moves coherently, then we can easily solve for the motion of the central pixel by using the surrounding pixels to set up a system of equations. For example, if we use a 5-by-5 window of brightness values, round the current pixel to compute its motion, we can then set up 25 equation as follows: To solve for this system, we set up a least-squares minimization of the equation, wherebymin Ad−b ² is solved in standard form as: (A A d^T ) =A b^T . From this relation we obtain our u and v motion components. Writing this out in more detail and the solution to this equation is yielded:

( ) 1

Chapter 4 Obstacle Detection Algorithm

4.1 Feature Point Extraction

4.1.1 Feature Analysis

There are many kinds of local features that one can track. Features which are used to estimate ground movement based on its motion, that is to find these features from one frame in a subsequent frame of the video stream. Obviously, if we pick a point on a large blank wall then it won’t be easy to find that same point in the next frame of a video. If all points on the wall are identical or even very similar, then we won’t have much luck tracking that point in subsequent frames. On the other hand, if we choose a point that is unique then we have a pretty good chance of finding that point again. In practice, the point or feature we select should be unique, or nearly unique, and should be parameterizable in such a way that it can be compared to other points in another image. Therefore, we might be tempted to look for points that have some significant change within neighboring local area that is the good features which have a strong derivative in spatial domain. Another characteristic of features is about the position of the image. Due to the objective of the following procedure is to estimate the ground movement information, features lie on the ground region is useful for the following ground movement estimation algorithm. According to above analysis, a good feature to track should have two characteristics. First, a feature should have strong derivative which could assist us to track them and obtain a precise motion. Then, the position of feature should be restricted on the road region (non-obstacle region). The features which we will use them to estimate the ground

movement information should conform the above two characteristic, these features will be suitable for estimating ground movement information.

To consider the first characteristic – strong derivative, a point to which a strong derivative is associated may be on an edge of some kind. Then considering this property of edge we employ the Sobel operator to find out the edge of image. The points which be extracted by edge detection are used to be feature points, which then calculate optical flow for all of these feature points. These edge points and its optical flow of image is shown in Fig. 4-1. But a problem is arising as depicted in Fig 4-1(b), it could look like all of the other points along the same edge. An ambiguous optical flow will happen when the edge points parallel to the direction of motion. It turns out that strong derivative of a single direction is not enough. However, if strong derivatives are observed in two orthogonal directions then we can hope that this point is more likely to be unique. For this reason, many trackable features in the image that are corners. Intuitively, corners are the points that contain enough information to be picked out from one frame to the next. We examined by the most commonly used definition of a corner was provided by Harris[25]. This definition relies on the matrix of the second-order derivatives (∂²x, ∂² y, ∂x∂y) of the image intensities. We can think of the second-order derivatives of images, taken at all points in the image, as forming new second-derivative images or, when combined together, a new Hessian image.

This terminology comes from the Hessian matrix around a point, which is defined in

two dimensions by:

By using Harris corner definition, the result of corner detection and its corresponding optical flow is shown Fig. 4-2. It is obvious to observe that motion of corner is

relative accurate than edge point. Although corner have more precise optical flow, another problem is arising that is position of corner almost lie on obstacle region such as vehicle component. The Fig. 4-2(b) is shown the result of corner detection in a common driving condition, we can see that the feature points are nearly located on obstacle regions. These feature points unable let us to estimate the ground movement.

(a) (b)

Fig. 4-1 Results of edge detection and its corresponding optical flow

(a) (b)

Fig. 4-2 Results of corner detection and its corresponding optical flow

By considering above analysis of features, we proposed a feature point extraction method employ road detection procedure to assist in getting ground features. The flowchart of proposed feature point extraction is shown in Fig. 4-3. The objective is to distinguish the major road region and non-road region, and utilize the result of the road detection, that is to extract the boundary of major road and some good features within road region. By integrating road detection, the more useful ground features

could be extracted and could improve results of ground movement effectively. The next chapter 4.1.2 will introduce the detail of road detection and describe what feature point will be selected.

Previous

Fig. 4-3 Flowchart of feature point extraction

4.1.2 Road Detection

The proposed feature point extraction technique is integrating a road detection procedure [26] which is used an on-line color model that we can train an adaptive color model to fit road color. The main objective of road detection is to discriminate the road and non-road region roughly, because the result is used to support feature extraction not used to extract obstacle regions. However, we adopt an on-line learning model that allows continuously update during driving, through the training method that can enhance plasticity and ensure the feature is on the road region.

Due to the color appearance in the driving environment, we have to select the color features and using these color features to build a color model of the road.

Therefore, we have to choose a color space which has uniform, little correlation, concentrated properties in order to increase the accuracy of the model. In computer color vision, all visible colors are represented by vectors in a three-dimensional color

space. Among all the common color spaces, RGB color space is the most common color feature selected because it is the initial format of the captured image without any distortion. However, the RGB color feature is high correlative, and the similar colors spread extensively in the color space. As a result, it is difficult to evaluate the similarity of two color from their 1-norm or Euclidean distance in the color space.

The other standard color space HSV is supposed to be closer to the way of human color perception. Both HSV and L*a*b* resist to the interference of illumination variation such as the shadow when modeling the road area. However, the performance of HSV model is not as good as L*a*b* model because the road color cause the HSV model not uniform that lead to the HSV color model not as uniform as the L*a*b* color model. There are many reasons attribute this result. Firstly, HSV is very sensitive and unstable when lightness is low. Furthermore, the Hue is computed by dividing (Imax - Imin) in which Imax = max(R,G,B), Imin = min(R,G,B), therefore when a pixel has a similar value of Red, Green and Blue components, the Hue of the pixel may be undetermined. Unfortunately, most of the road surface is in similar gray colors with very close R, G, and B values. If using HSV color space to build road color model, the sensitive variation and fluctuation of Hue will generate inconsistent road colors and decrease the accuracy and effectiveness. L*a*b* color space is based on data-driven human perception research that assumes the human visual system owing to its uniform, little correlation, concentrate characteristics are ideally developed for processing natural scenes and is popular for color-processed rendering.

L*a*b* color space also possesses these characteristics to satisfy our requirement. It maps similar colors to the reference color with about the same differences by Euclidean distances measure and demonstrates more concentrated color distribution than others. Then considering the advantaged properties of L*a*b* for general road environment, the L*a*b* color space for road detection is adopted.

The RGB-L*a*b* conversion is described as follow equations:

1. RGB-XYZ conversion:

where Y Z are tristimulus values of reference white poi

⎧ ⎛ ⎞

By modeling and updating of the L*a*b* color model, the built road color model can be used to extract the road region. The L*a*b* model is constituted of K color balls, and each color ball mi is formed by a center on ( ,* ,* )

i i i

m m m

L a b and a fixed radiusλ_max =5 as seen in Fig. 4-4. In order to train a color model, we set a fixed area of the lower part of the image and assume pixels in the area are the road samples. For each of these pixels in the beginning 30 frames are used to initialize the color model, and updating the model every ten frames to increase processing speed but still maintain high accurate performance.

Fig. 4-4 A color ball i in the L*a*b* color model whose center is at (Lm, *am, *bm) and with radius λ _max

The sampling area is used to be modeled by a group of K weighted color balls.

We denote the weight and the counter of the mi th color ball at a time instant t by _,

m ti

and _,

m ti

Counter , and the weight of each color ball represents the stability of the color.

The color ball which more on-line samples belonged to over time accumulated a bigger weight value shown in Fig. 4-5. Adopting the weight module increases robustness of the model.

Fig. 4-5 Sampling area and color ball with a weight which represents the similarity to current road color.

The weight of each color ball is updated by its counter when the new sample is coming which is called one iteration. Therefore the counter would be initialized to zero at the beginning of iteration. The counter of each color ball records the number of pixels added from the on-line samples in the iteration. The first thing to do is that which color ball is chosen to be added. We measure the similarity between new pixel xt and the existing K color balls using a Euclidean distance measure (4-1). The maximum value of K is 50 which represents each on-lined model contains 50 color balls at most. added to the counter of best matching color at this iteration as the equation (4-2).

After entire new sample pixels at this iteration undertake the matching procedures mentioned above, the weights of every color ball are updated according to their current counter and their weight at last iteration. The updating method is as follows:

i mi i i max

, where α is user-defined learning rate, N_w sample is the sampling area

Then using the weight to decide which color ball of the model most adapt and resemble current road. The color balls are sorted in a decreasing order according to their weights. As a result, the most probable road color features are at the top of the list. The first B color balls are selected to be enabled as standard color for road detection, and these color balls with a higher weight has more importance in detection

step

4.1

Fig. 4-8 Results of feature point extraction. The upper image is result of road detection, and lower image is position of feature points.

4.2 Ground Movement Estimation

In this section we will introduce the proposed ground movement estimation procedure. Ground movement information is estimated from optical flow in the world coordinate system. By analyzing the principal distribution of optical flow can let us get the most representative ground movement, which will be used to compensate for the previous frame and difference with current frame. In addition, ground movement will be verified via temporal coherence. The flow of ground movement estimation is illustrated in Fig. 4-9.

Map the info. To World coordinate

Ground movement info. estimation

Build Compensated

Image Compensation

Verification

Compensated Image Feature

point info. Optical flow calculation

Fig. 4-9 flow of ground movement estimation

4.2.1 Ground Motion Estimation

By feature point extraction procedure as described in section 4.1, the useful features for ground movement estimation is obtained. Then these feature points will be used to estimate ground motion. Therefore, when the feature points are acquired the main tasks of ground movement estimation procedure are high accuracy optical computation and ground movement information estimation. The first step is to calculate the optical flow for all of these feature points. The pyramidal Lucas and Kanade algorithm introduced in section 3.3, which copes efficiently with large movements, is used to calculate the optical flow for these feature points in the original image. As a result, Fig. 4-10(a) and Fig. 4-11(a) shows the feature points and its corresponding optical flow in the original image. Due to the perspective effect, the directions and lengths of the optical flows on the road in the original image are not the same when vehicle is moving straight shown in Fig. 4-10(a). The case in which the

vehicle is turning is shown in Fig. 4-11. In this case the complicated optical flow distribution appeared in the original image. However, the inconsistent optical flow of road in the original image let us encounter a difficulty to estimate the ground movement. Therefore, we will take advantage of the IPM to remove the perspective effect. The optical flow information of an original image is mapped into world coordinate. The objective of inverse perspective mapping (IPM) is to remove the perspective effect by transforming the image coordinate to world coordinate, and scale the world coordinate that can obtain a bird’s view image. Therefore, the world coordinate information is same as bird’s view image that both of them are perspective removal. In our research, the IPM is used to remove the perspective effect and transform the image coordinate information to world coordinate and the ground movement procedure is processed in the world coordinate the bird’s view image is used to display and examine some results, which we will not process on it.

在文檔中以反透視映射模型與移動補償為基礎的障礙物偵測應用在倒車行為的安全上 (頁 30-0)