A robust in-car digital image stabilization technique

(1)

Abstract—Machine vision is a key technology used in an intelli-gent transportation system (ITS) to augment human drivers’ visual capabilities. For the in-car applications, additional motion compo-nents are usually induced by disturbances such as the bumpy ride of the vehicle or the steering effect, and they will affect the image in-terpretation processes that is required by the motion field (motion vector) detection in the image. In this paper, a novel robust in-car digital image stabilization (DIS) technique is proposed to stably remove the unwanted shaking phenomena in the image sequences captured by in-car video cameras without the influence caused by moving object (front vehicles) in the image or intentional motion of the car, etc. In the motion estimation process, the representative point matching (RPM) module combined with the inverse triangle method is used to determine and extract reliable motion vectors in plain images that lack features or contain a large low-contrast area to increase the robustness in different imaging conditions, since most of the images captured by in-car video cameras include large low-contrast sky areas. An adaptive background evaluation model is developed to deal with irregular images that contain large moving objects (front vehicles) or a low-contrast area above the skyline. In the motion compensation processing, a compensating motion vector (CMV) estimation method with an inner feedback-loop integrator is proposed to stably remove the unwanted shaking phenomena in the images without losing the effective area of the images with a constant motion condition. The proposed DIS tech-nique was applied to the on-road captured video sequences with various irregular conditions for performance demonstrations.

Index Terms—Adaptive background-based evaluation function, in-car digital image stabilizer (ICDIS), intelligent transportation system (ITS), inverse triangle method, representative point match-ing (RPM), smoothness index (SI).

I. INTRODUCTION

M

ACHINE vision is a key technology used in any intelli-gent transportation system (ITS) to augment or replace Manuscript received August 24, 2005; revised January 18, 2006 and May 5, 2006. This work was supported in part by Ministry of Education, Taiwan, un-der Grant EX-91-E-FA06-4-4 and in part by the Ministry of Economic Affairs, Taiwan, under Grant 93-EC-17-A-02-S1-032. This paper was recommended by Guest Editers F.-Y. Wang, D. Liu, and S. X. Yang.

S.-C. Hsu is with the Department of Electrical and Control Engineering, Na-tional Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C., and also with the Department of Electrical Engineering, Ta-Hwa Institute of Technology, Hsinchu 307, Taiwan, R.O.C.

S.-F. Liang is with the Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 701, Taiwan, R.O.C., and also with the Brain Research Center, National Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: [email protected]).

K.-W. Fan is with the Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C.

C.-T. Lin is with the Department of Computer Science and the Department of Electrical and Control Engineering, National Chiao-Tung University, Hsinchu 300, Taiwan, R.O.C. He is also with the Brain Research Center, University Sys-tem of Taiwan, Hsinchu 300, Taiwan, R.O.C. (e-mail: [email protected]).

Digital Object Identifier 10.1109/TSMCC.2006.887009

human drivers’ visual capabilities. ITS research involves four major issues: increasing the capacity of highways, improving safety, reducing fuel consumption, and reducing pollution. ITS can use some intelligent control strategies, such as agent-based control concepts [22], [23], to manage the transportation and traffic problems. In machine vision aspect, it can be used to detect land markings, vehicles, pedestrians, road signs, traffic conditions, traffic incidents, and even driver drowsiness or to assist the driver to get more information and reduce driving accidents. These applications are almost included in the first two issues of ITS researches. Four typical applications that in-volve machine vision are: 1) cruise assistance; 2) urban driving assistance; 3) driver monitoring; and 4) traffic and road mon-itoring. From the site of image acquisition, it can be divided into in-car or off-car applications. The former three items be-long to in-car applications, and the common concern in most applications is reliability. The reliability is related to the image-acquisition process and image interpretation process, i.e., the contrast and the resolution of the images, the stability of the image sequence, and the reliability of image interpretation, etc. The better image-acquisition process will increase the feasibil-ity and reliabilfeasibil-ity of the process and analysis afterward. The increase in the contrast and the resolution of images are pure hardware issues. It has been designed by a wide-dynamic-range approach to improve the success rate of lane detection under high-intensity contrast [1]. Most image interpretation processes need to detect the motion field (motion vector) in the image. In an ideal environment, the motion field is easy to interpret. How-ever, practical motion fields deviate from the simple description. Additional motion components are induced by disturbances like the bumpy ride of the vehicle or the steering effect. To enable the efficient image interpretation process, these disturbances have to be compensated in advance. In this paper, a method to ac-quire the stable image sequence by in-car cameras which can be used for driver assistance or subsequent processes is proposed.

Digital image sequences acquired by in-car video cameras are usually affected by undesired motions produced by a bumpy ride or by steering. The unwanted positional fluctuations of the image sequence will affect the visual quality and impede the sub-sequent processes for various applications. Although undesired motions are usually irregular and uneven compared to inten-tional global motions such as car movement or camera panning, the challenge of image stabilization systems is how to compen-sate the unwanted shaking of the camera without the influence caused by the moving object in the image or the intentional motion of the car.

(2)

Fig. 1. Motion compensation schematics.

stabilizers. The electronic image stabilizer (EIS) stabilizes the image sequence by employing motion sensors to detect the cam-era movement for compensation. The optical image stabilizer (OIS) employs a prism assembly that moves opposite the shak-ing of camera for stabilization [2], [3]. Because both EIS and OIS are hardware dependent, the applications are restricted to device built-in online processes. Digital image stabilization (DIS) is the process of removing the undesired motion effects to generate a compensated image sequence by using digital image processing techniques without any mechanical devices such as gyro sen-sors or a fluid prism [4]. The major advantages of DIS are: 1) machine independence and 2) suitability for miniature hardware implementation (since the mechanical device is not required for compensation) [5].

The DIS system is generally composed of two processing units: the motion estimation unit and the motion compensation unit. The purpose of the motion estimation unit is to estimate the reliable global camera movement through three processing steps on the acquired image sequence: 1) evaluation of local mo-tion vectors (LMVs); 2) detecmo-tion of unreliable momo-tion vector components; and 3) determination of the global motion vector (GMV). Following the motion estimation, the motion compen-sation unit generates the compensating motion vector (CMV) and shifts the current picking window according to the CMV to obtain a smoother image sequence. Fig. 1 shows the motion compensation schematics. The window of frame(t− 1) is the previous compensated image. The compensating motion vec-tor v is generated by the DIS according to the GMV between two consecutive images. The window of frame(t) is the pick-ing window accordpick-ing to the compensatpick-ing motion vector v to minimize the shaking effect.

Various algorithms had been developed to estimate the LMVs in DIS applications such as representative point matching (RPM) [5], [6], edge pattern matching (EPM) [7], [8], bit-plane matching (BPM) [4], [9], and others [10]–[14]. It had also been demonstrated that the DIS can reduce the bit rate for video communication [15]. The major objective of these algorithms is to reduce the computational complexity, in comparison with a full-search block-matching method, without losing too much accuracy. In general, the RPM can greatly reduce the complexity of computation in comparison with the other methods. However, it is sensitive to irregular conditions such as moving objects and

and scene changing to determine the GMV. However, these two signals cannot widely cover various irregular conditions such as the lack of features or containing large moving objects in the images, and it is also hard to determine an optimum threshold for discrimination in various conditions. Some researchers es-timate LMV using feature-based techniques that track a small number of image features (points, lines, and contours or cer-tain objects, etc.) to evaluate the motion vector. This makes it efficient and available for real-time implementation. But the difficulty is that, especially for outdoor applications, it cannot stably and accurately find available features in the image [16]. Based on the optical flow technique, a fundamental approach in computer vision, many methods have been proposed in the literature to solve different types of problems. The estimation of optical flow is based on the assumption that the intensity of the object (or specified pixel) in the image sequence is constant. The difficulty is that most consumer video camcorders have an autoshutter function to adjust average intensity dynamically such that maintaining constant intensity of the object becomes impossible in real applications. In this paper, a reliable LMV ex-traction method is proposed to determine the GMVs for practical applications.

In the motion compensation of DIS, accumulated motion vec-tor estimation [7] and frame position smoothing (FPS) [17]–[19] are the two most popular approaches. The accumulated motion vector estimation needs to compromise stabilization and inten-tional panning (constant motion) preservation since the panning condition causes a steady-state lag in the motion trajectory [17]. The FPS accomplished the smooth reconstruction of an ac-tual long-term camera motion by filtering out jitter components based on the concept of designing the filter with an appropri-ated cutoff frequency. The disadvantage of FPS is that it does not guarantee the availability of the determined CMV when the specified bound is restricted for preserving the effective image area in the DIS applications.

In this paper, a novel robust in-car DIS technique is proposed. The minimum projection and inverse triangle method are em-ployed to estimate the reliability of the motion vector and the coarse skyline. Then an adaptive background evaluation model for deriving GMV is developed to deal with irregular images that contain large moving objects or low-contrast area above the sky-line. The accumulated motion vector estimation combined with an integrator in the inner feedback loop is also applied to remove the shaking effect without losing the effective area of the images with constant motion. Video sequences with various irregular conditions, such as the lack of features, large low-contrast area, moving objects, or repeated patterns, etc., were used for testing, and the experimental results demonstrate that the proposed algo-rithm can perform very well in such conditions. A smoothness index (SI) is also proposed in this paper to quantitatively evalu-ate the performances of different image stabilization methods.

(3)

Fig. 2. System architecture of the proposed digital image stabilization technique.

Fig. 3. Division of image for LMV estimation.

This paper is organized as follows. Section II describes the system architecture of the DIS and the proposed motion estimation method. Section III proposes a motion compensa-tion method and quantitative evaluacompensa-tion. Seccompensa-tion IV presents the experimental results for demonstrations. Section V gives conclusions of this paper.

II. SYSTEMARCHITECTURE OF THEDIS ANDMOTIONESTIMATION

The system architecture of the proposed DIS technique is shown in Fig. 2, which includes two processing units: the mo-tion estimamo-tion unit and the momo-tion compensamo-tion unit. The motion estimation unit consists of three estimators: the LMVs, the refined motion vector (RMV), and the GMV estimators. The motion compensation unit consists of the CMV estimation and image compensation. The two incoming consecutive im-ages frame(t− 1) and frame(t) will be first divided into four regions as shown in Fig. 3. An LMV will be derived in each region by the RPM algorithm [5], [6]. The motion estimation unit also contains a reliability detection function that will gen-erate an ill-conditioned motion vector for the irregular image conditions such as the lack of features or containing a large low-contrast area, etc. The GMV estimation determines a GMV among LMVs, the RMV, and other preselected motion vectors through the adaptive background-based evaluation function. Fi-nally, the CMV is generated according to the resultant GMV, and the image sequences will be compensated based on the CMV in the motion compensation unit. The rest of this section will focus on the details of the motion estimation unit of the proposed DIS technique. The details of the proposed motion compensation unit will be presented in Section III.

A. Motion Estimation

The motion estimation unit shown in Fig. 2 contains the LMV, RMV, and GMV estimators. As shown in Fig. 4, LMV and RMV estimation are to generate the LMVs and RMVs for GMV esti-mation. The LMVs can be obtained from the correlation between

two consecutive images by the RPM algorithm. The RMV can be obtained from LMVs by evaluating the corresponding confi-dence indices through the irregular condition detection and the proposed RMV generation algorithm.

1) RPM and Local Motion Estimation: It has been

demon-strated that a local approach using a regional matching process is more robust and stable than a direct global matching pro-cess [20]. That means using the LMVs estimated by the divided regions to determine the GMV is more robust and stable than a direct approach. There is also a tradeoff for the size of divided region. Reducing the size of the divided region increases the robustness, but the size of the divided region should be suffi-ciently large to hold the average distribution [20]. If we want to divide the image such that the horizontal and vertical com-ponents have the same partitions, it should be divided into n2

regions. More divided regions will increase the computational cost to estimate the LMV for each region. Therefore, we only divide the image into four regions as shown in Fig. 3 for the RPM method, and it can cover various situations in the in-car DIS applications by combining the proposed inverse triangle method and the adaptive background evaluation model.

Each region is further divided into 30 subregions (with each side of 5 rows× 6 columns), and the central pixel of each subre-gion is selected as the representative point to represent the pat-tern of this subregion. This layout is based on the size of images captured by the regular imaging devices such as 640 × 480 or 320 × 240. In order to make the representative points equally distributed in spatial, the ratio of row and column should be maintained by as close to 0.75 as possible. Fig. 5 shows the ex-perimental result of calculating the cost level [that is an index of reliability defined in (8)] by using different number of represen-tative points. The higher cost level indicates the lower reliability, and the threshold is set as 18 according to our experimental re-sults. It is the averaged testing result for four experimental video sequences VS#1–4 used in Section IV. It can be found that if the number of the representative points is larger than 30, the cost level will go down to the threshold and almost all the mo-tion vectors calculated by the RPM method are reliable. In other words, in this case, the cost level will be good enough as the lower cost level indicates high reliability. In order to keep low computation time complexity, 30 representative points are used in our system.

Then the correlation calculation of RPM with respect to rep-resentative point (Xr, Yr) is performed as

Ri(p, q) = N

r=1

(4)

Fig. 4. Block diagram of LMVs and RMV estimation.

Fig. 5. Experimental result of calculating the cost level [an index of reliability defined in (8)] by using different number of representative points.

where N is the number of representative points in one re-gion, I(t− 1, Xr, Yr) is the intensity of the representative point

(Xr, Yr) at frame(t− 1), and Ri(p, q) is the correlation

mea-sure for a shift (p, q) between the representative points in region

i at frame(t− 1) and the relative shifting points at frame(t).

Assuming RiMinis the minimum correlation value in region i,

i.e., RiMin= Min

p,q (Ri(p, q)), the shift vector vi that produces

the minimum correlation value for region i represents the LMV of this region, i.e.,

vi= (p, q), for Ri(p, q) = RiMin. (2)

2) Irregular Condition Detection: Analyzing the curves of

correlation values corresponding to image sequences with vari-ous conditions, it is found that the curve of correlation values is related to the reliability of motion detection. Figs. 6 and 7 show the various correlation curves corresponding to different sam-ple image sequences with different conditions. Fig. 6(a) shows a normal condition that the peak is obvious in each region. In Fig. 6(b), the curve looks like a valley; it means only one dimen-sion of correlation data (x direction) is reliable, and it lacks for feature of y (horizontal) direction. Fig. 6(c) shows an example with repeated patterns, which is a brick wall with a fence in the bottom area, and it causes multiple peaks in the correlation curves, especially within region 1 due to pure bricks repeated in this area. Fig. 7(a) represents moving-object conditions. A motorcycle moves from the right side to the left in the image sequence. It causes double peaks within region 1 of the curve, and the value of RiMin is larger than those areas without the

moving object such as region 3. The example shown in Fig. 7(b)

contains a large low-contrast area on the top right corner of the image. We can find that it is harder to distinguish the peak within region 1 from the correlation curve.

Although the curve of correlation values is related to the reliability of motion detection, it is still too complex to directly use these curves to evaluate the reliability of motion detection. In this paper, we propose a strategy that combines the minimum projections of the correlation curve in the x and y directions (minimum projections) and the inverse triangle method to detect the irregular conditions from each region. The mathematical expression of minimum projections can be written as

xi min(p) = min

q Ri(p, q)

yi min(q) = min

p Ri(p, q) (3)

where xi min(p) and yi min(p) are the minimum projections

of correlation curve in the x and y directions in region i, respec-tively. Fig. 8 shows examples of minimum projections of the correlation curve in the x and y directions from the regular and the ill-conditioned image sequences. Fig. 8(a) is the minimum projection of Fig. 6(a) that is regular, and the determination of motion vector in each region is clear and consistent. Fig. 8(b) is the minimum projection of Fig. 6(b) that lacks for the feature in the y direction (horizontal). The values of minimum projection of the correlation curve in the y direction are within a small range and erratic with multiple peaks such that the determination of the minimum value is very hard.

In order to determine the reliability of the motion vector easily, the feature extraction of reliability is performed by the proposed inverse triangle method through the minimum projec-tions in the x and y direcprojec-tions to obtain the reliability indices. Fig. 9 shows the illustration of the inverse triangle method. In the first step, we find Ti min that represents the global

mini-mum of the minimini-mum projection curve in region i and can be calculated by (4). In the second step, we calculate Sxiand Syi

by (4), where offset is the altitude of the inverse triangle, nxiand

nyiare defined as the numbers of Sxiand Syi, respectively [see

(6)], dxiand dyiare defined as the distances of two vertexes of

the base of inverse triangle obtained by (7). The cost level of the

x and y directions are calculated by (8). The higher cost level

means a lower confidence level. Since the condition of multiple peaks seriously degrades and affects the determination of relia-bility, the penalty of multiple peaks is taken into account by (8) to improve the discrimination of reliability. The example shown in Fig. 9 is a curve with twin peaks which will get the penalty of

dxi− nxi. In the third step, we determine the confidence indices

of xiand yiin region i through a threshold denoted as TH. The

(5)

Fig. 6. Various correlation curves corresponding to image sequences with different conditions (I). (a) A normal condition. (b) Lacks feature in horizontal direction (gate). (c) Repeated patterns (brick).

summing up the counts of reliable motion components of x and

y in four regions as (10), we get

Num(xi) and Num(yi), i = 1− 4.

Step 1: Find global minimum Ti min from xi min(p) or

yi min(q)

Ti min = min

(6)

Fig. 7. Various correlation curves corresponding to image sequences with different conditions (II). (a) Moving object (motorcycle). (b) Large low-contrast area (sky).

Step 2: Calculate the cost level, xi cost and yi cost

Sxi={p|xi min(p) < Ti min +oﬀset}

Syi={q|yi min(q) < Ti min +oﬀset} (5)

nxi= number of Sxi nyi= number of Syi (6) dxi= max P Sxi− minP Sxi dyi= max q Syi− minq Syi (7) xi cost = 2dxi− nxi yi cost = 2dyi− nyi . (8)

Step 3: Set the threshold TH for determining the reliability

indices If xi cost < TH Then xiis reliable Else xiis reliable End if If yi cost < TH yiis reliable Else yiis reliable End if (9)

1) Calculate the numbers of xiand yiin four regions

Num(xi) = sum of(xiis reliable)

Num(yi) = sum of(yiis reliable) (10)

i = 1–4.

3) Generation of RMV From Ill Condition: Irregular

mo-tion vectors can be detected and excluded by using minimum projection and the inverse triangle method; however, an image

(7)

Fig. 8. Examples of minimum projections of correlation curve from the x and y directions in four regions. (a) Regular image sequence. (b) Ill-conditioned image sequence.

sequence with an ill condition such as lack of feature, large low-contrast area, moving object, or repeated pattern, may con-tain fewer available motion vectors (most of the motion vectors are irregular) in four regions. Therefore, recombination of these available components of regular motion vectors is necessary to form an RMV. To solve this problem, a median function is used to extract a motion vector with respect to each direction for an ill condition. The calculation to determine the RMV is described as follows in detail

Case 1: If Num(xi(t)) = 4 then

Vreﬁned x(t) = Med(Va x(t), Vb x(t), Vc x(t), Vd x(t), GMVx(t− 1))

Vreﬁned x(t) = Med(Va x(t), Vb x(t), Vc x(t)) Case 3: If Num(xi(t)) = 2 then

Vreﬁned x(t) = Med(Va x(t), Vb x(t),

GMVx(t− 1)) (11)

Vreﬁned x(t) = Va x(t) Case 5: If Num(xi(t)) = 0 then

(8)

Fig. 9. Illustration of the proposed inverse triangle method.

where Num(xi(t)) is the number of x component of

re-liable LMVs, Vreﬁned x(t) is the x component of RMV, Va x(t), Vb x(t), Vc x(t), and Vd x(t) represent x components

of reliable LMVs in a different region, respectively, Med is the function of median operation, GMVx(t− 1) is the x

compo-nent of the one preceding the last GMV, t is frame number, and

γ is attenuation coefficient, 0 < γ < 1. The GMVavgx(t) can

be calculated by

GMVavgx(t) = ζGMVavgx(t− 1)

+ (1− ζ)GMVx(t), 0 < ζ < 1. (12)

Then we apply the similar process to obtain Vreﬁned y(t). The

resultant RMV is represented by Vrefined(t) = Vrefined x(t) Vrefined y(t) . (13) B. GMV Estimation

The objective of GMV estimation is to determine a motion vector from existing data what we have evaluated from a mo-tion estimamo-tion process. In a practical in-car video sequence, it always suffers from moving objects, repeated patterns, motion effects of cars, etc. The LMV in each region may represent GMV, moving-object motion vector, or even error vector, respectively. The error vector may be caused by the ill condition, repeated pattern, or the mixture of global and moving-object motion. Al-though the reliable GMV is essentially selected from LMVs and RMV, however, in the worst case, when the LMVs and RMV are all fault, it will induce a worse result after compensation compared with the original images. Therefore, if the evaluation includes the zero motion vector (ZMV), it can prevent the occur-rence of this case. Similarly, for an image sequence with constant motion in the scene, it will induce a worse result if it is compen-sated by ZMV or the error motion vector rather than by the av-erage motion vector (AMV). In the proposed DIS technique, the seven motion vectors including four LMVs, the RMV, the ZMV,

Fig. 10. Areas for the background-based evaluation adapted by the detected skyline.

and the AMV, referred as preselected motion vectors (pre MV), are employed to estimate the GMV of the current frame. In gen-eral, one of the LMVs is the highly probable GMV for the regular image; the RMV is the highly probable GMV for the ill-conditioned image; the ZMV can prevent a worse compensation result caused by the unreliable MVs; and the AMV is useful for constant motion of the car. In addition, if the image sequence contains a large moving object, the determination of global motion is troublesome because the determined motion vector probably switches between the background and large moving object or is totally dominated by the large moving object. In this case, it will lead to artificial shaking and cause a major challenge in DIS.

1) Skyline Detection: To improve the robustness of the GMV

estimation, the adaptive background-based evaluation function is proposed to overcome this problem. First, skyline detection will be performed. Then, five regions based on the estimated sky-line are selected to evaluate the result. These regions (Xi, Yj)

are located on the surroundings of the image under the sky-line. In most outdoor applications based on in-car cameras, the pixels of the area above the skyline are low contrast. The sky-line detection can prevent the invalid result due to some of the five regions located on the low-contrast area. Selecting the re-gions surrounding the boundary of the image to evaluate the obtained motion vector can avoid the disturbance of moving-object effects for GMV estimation. Fig. 10 shows the adopted areas for the adaptive background-based evaluation according to detecting skyline. The proposed skyline detection combines RPM correlation evaluation, minimum projection, and an in-verse triangle method. First, we calculate the absolute differ-ences between the representative point at frame(t− 1), and the corresponding neighborhood pixels in the same subregion at frame(t) by (13) that is regarded as intermedium of (1) will yield

Ci,j(p, q) =|I(t − 1, Xi, Yj)− I(t, Xi+p, Yj+q)| (14)

where (i, j) denotes the position of one subregion with re-spect to the row and column as shown in Fig. 11, and there are 120 subregions (10 rows × 12 columns in this paper), (Xi, Yj) is the coordinate of the representative point in the (i, j)

(9)

Fig. 11. Skyline detection algorithm is to combine RPM correlation evalua-tion, minimum projecevalua-tion, and the inverse triangle method.

subregion, I(t− 1, Xi, Yj) is the intensity of the representative

point (Xi, Yj) at frame(t− 1), and (p, q) is a shifting vector

within the subregion. Then we can derive the correlation curve for detecting the skyline by calculating

Cl(p, q) = l i=1 M j=1 Ci,j(p, q) (15)

where M is the total number of subregions in the horizontal axis (M = 12), l represents the lth row of the subregions. Initially,

l = 1 and the minimum projection and inverse triangle method

presented in (4)–(8) are applied to Cl(p, q) to get the confidence

index in the horizontal direction. The cost level is relatively high when the corresponding area is a low-contrast area such as the sky. If the level is lower than the presetting threshold then we stop the evaluation process, and the horizontal position of the representative points of the subregions located in the last row of Cl(p, q) is defined as the coarse skyline. Otherwise, we set

l = l + 1 and continue the evaluation of Cl(p, q) till the level is

lower than the presetting threshold. Fig. 12 shows the results of skyline detection in the video sequence taken from a highway. The coarse skyline is used to adaptively layout the background-based evaluation blocks located on the higher contrast area. It improves the robustness of GMV estimation in the in-car image stabilization applications.

2) Peer-to-Peer Evaluation: The estimation of the GMV is

calculated by the summation of absolute difference (SAD) SADBi,c= X,Y∈ Bi |I(t − 1, X, Y ) − I(t, X + Xc, Y + Yc)| , 1 ≤ i ≤ 5, 1 ≤ c ≤ 7 (16) where I(t− 1, X, Y ) is the intensity of the point (X, Y ) at frame(t− 1), Bi is the ith background region in the image,

Xc, Ycare the components of the seven preselect motion vectors

(pre MVc) in x and y directions.

Different (pre MVc) will have their SADBi,cin each region.

The smaller SADBi,crepresents the higher probability of the

Sc=

i=1

Si,c. (17)

Five-region peer-to-peer evaluation can prevent the situation that some partial high-contrast image regions dominate the eval-uation result. In this algorithm, each region has an equal priority to determine the result. In (17), Scis the index to determine the

GMV. The pre MVc with the smallest Sc is the desired GMV,

and it can be expressed as

GMV = pre MV_i i = arg

c

(min Sc). (18)

According to these sophisticated evaluation areas, the evalu-ation function can detect attributed background motion vector precisely in most circumstances.

III. MOTIONCOMPENSATION ANDEVALUATION

A. CMV Estimation

The first step of motion compensation is to generate the CMVs for removing the undesired shaking motion and still keeping the steady motion in the image sequence. The conventional CMV estimation was given by [7], and will yield

CMV(t) = k(CMV(t− 1)) + (αGMV(t) + (1− α)GMV(t − 1)),

0 < k < 1 and 0≤ α ≤ 1 (19) where t represents the frame number. The increase in k causes the decrease in unwanted shaking effect but increase in the value of CMV, which means the effective area of images is reduced if we want to maintain the consistent image size for the whole image sequence. To illustrate this phenomenon, the motion trajectories can be calculated to analyze the problem. The motion trajectories can be obtained by

MTraj_o(t) = t i=1 GMV(i) (20) MTraj_c(t) = _t i=1 GMV(i) − CMV(t) (21) where MTraj_o(t) and MTraj_c(t) are the original and the com-pensated motion trajectories of the image sequence at frame(t). Fig. 13 shows the performance comparison of three different CMV generation methods applied to a video sequence with con-stant motion and jitter in the image. There are two trajectories in each subfigure; one is the original trajectory calculated by (20), and the other one is the compensated trajectory calculated by (21). The CMVs in Fig. 13(a) are generated by the conventional method shown in (19). Obviously, MTrajc(t) has tremendous

(10)

Fig. 12. Skyline detection applies on the in-car video sequence taken from highway.

The CMV probably exceeds the window shifting allowance such that the available effective image area during the compensation process is reduced. The CMVs in Fig. 13(b) are generated by (19) a with clipper function as

CMV(t) = clipper(CMV(t)) =1

2(|CMV(t) + l| − |CMV(t) l|) (22) where l is boundary limitation, i.e., the maximum window shift allowance. In this case, the lag can be reduced to a certain range. However, it will also decrease the performance of shaking compensation due to the picking window operating near the boundary area.

In order to deal with the above problem, Vella et al. used the passive method of ceasing for correction in this condition [10]. That implied that the undesired shaking effect cannot be elimi-nated in the constant motion condition. To overcome this draw-back, we combine the inner feedback-loop integrator with a clipper function to reduce the steady-state lag for steady mo-tion as well as to keep the CMV to operate in the appropriate range. Fig. 14 shows the block diagram of the proposed CMV generation method. There is an integrator in the inner feedback loop, which can eliminate the steady-state lag of the CMV in the constant motion condition. That means, by employing the inte-grator, shaking components of the images with constant motion effect as well as those in regular images can be stabilized. It is noted that the CMV computation procedure is applied to x and

y components separately. That is, parameters corresponding to x and y directions can be set as different values. In general, the

constant-motion condition usually occurs in horizontal tion such that the shaking patterns are different in both direc-tions. The proposed CMV computation procedure is presented by CMV(t) = k• CMV(t − 1) + GMV(t) − β • CMV I(t − 1) CMV I(t) = CMV I(t− 1) + CMV(t) CMV(t) = clipper(CMV(t)) (23) where k≥ 0 0 β ≤ 1 1 .

The symbol • denotes array multiplication, and clipper( ) is defined in (22).

Fig. 13(c) shows the compensated motion trajectory gener-ated by the proposed method. Compared with Figs. 13(a) and (b), the proposed method can reduce the steady-state lag of the com-pensated motion trajectory in the constant-motion condition and keep the CMVs in an appropriate range.

B. Quantitative Evaluation

The shaking effect of images can be evaluated by the sum-mation of absolute differences of momentums within every two consecutive frames. The mass of an image can be set as a con-stant such as one for simplicity or a value from zero to one according to the degree of shaking in the images measured by human visual perception. The SI is proposed to quantitatively evaluate the performance of different DIS algorithms, and it is defined as SI = 1 N− 1 N t=2 ∆m(t) = 1 N− 1 N t=2 m× |GMV(t) − GMV(t − 1)| (24)

where t is the frame number, N is the number of total frames, m is the mass of the image, and ∆m(t) is the rate of change of the absolute value of momentum. The lower SI means less shaking components in the image sequence, and it represents the effect of better smoothness.

IV. EXPERIMENTALRESULTS

In this section, the performance of the proposed DIS tech-nique is evaluated and compared to other existing DIS methods based on the performance indices of motion estimation and motion smoothing, respectively. To do this, four real video se-quences captured by an in-car camera with various irregular con-ditions are used for testing. Each video sequence has resolution

(11)

Fig. 13. Performance comparison of three different CMV generation methods applied to a video sequence with panning and hand shaking. (a) CMV genera-tion method in (19). (b) CMV generagenera-tion method in (19) with clipper in (22). (c) Proposed method in (23).

of 640 × 480. The VS#1 is a video of a door gate taken with constant camera motion and jitter.It lacks for features in the hor-izontal direction. The VS#2 is a video taken of a community road with bumpy conditions. The VS#3 is a video of highway taken with jitter. The VS#4 is a video taken of a parking lot when the car is turning. The motion estimation performance is

Fig. 14. Block diagram of the proposed CMV generation method. TABLE I

RMSE COMPARISONS OFRPM FUZZY AND THEPROPOSEDMETHOD WITHRESPECT TOFOURREALVIDEOSEQUENCES

evaluated based on the root mean square error (RMSE) between the algorithmically estimated motion vectors and the desired motion vectors evaluated by human visual perception as well as considering the background factor frame by frame. The RMSE is given by RMSE = 1 N N n=1 [(xn− xdn)2+ (yn− ydn)2] (25)

where (xdn, ydn) is the desired motion vector and (xn, yn) is

the motion vector generated from the evaluated DIS algorithms. The proposed method is compared to a RPM approach with fuzzy set theory (RPM FUZZY) [6]. The motion estimation results of these two methods are summarized in Table I. The VS#1 lacks for feature in horizontal direction such that only one component of motion vector is reliable [see Fig. 6(b)]. The proposed method applies the minimum projection approach and inverse triangle method to detect the irregular components of LMVs and then recombines available motion vectors to form an RMV. This approach can sufficiently use the existing infor-mation to estimate the GMV. The testing result with respect to VS#1 shows that the RMSE reduces from 5.8348 to 2.5269 by using our method since the RPM FUZZY did not consider the condition of lack of feature. The results with respect to VS#2–4 also show that RMSEs of our method are superior to RPM FUZZY since the resultant GMV through the adaptive background-based evaluation can avoid the influence of large moving objects and the irregular components of motion vectors are also considered.

The motion smoothing performance is evaluated by the SI proposed in Section III. Fig. 13(c) shows the original motion trajectory versus the compensated motion trajectory generated by the proposed method. Compared with Fig. 13(a) and (b), the proposed method can reduce the steady-state lag of the com-pensated motion trajectory in constant motion condition and keep the CMVs in an appropriate range. Table II shows the SI comparisons of three CMV generation methods presented in

(12)

Fig. 15. Comparisons of original and compensated motion trajectories by two different CMV generation methods (with and without integrator) with respect to (a) GMV set #1, (b) GMV set #2, (c) GMV set #3, and (d) GMV set #4.

Fig. 13. The generation of CMV without clipper is impractical since it lost too much effective image area, i.e., the maximum of the CMVs does not guarantee to fit the practical compensa-tion range. The proposed CMV generacompensa-tion method dramatically reduces the SI value from 5.6482 to 0.9346 compared with the

CMV generation without integrator. The reason is that the ef-fect of the inner feedback-loop integrator greatly reduces the steady-state lag in the image sequence with constant motion.

We also evaluate the CMV generation methods by four GMV sets generated from real video sequences (GMV sets #1–4).

(13)

TABLE III

PARAMETERSAPPLIED TOCMV GENERATION WITHDIFFERENTEQUATIONS

Fig. 15 shows the comparison of original and compensated mo-tion trajectories by using two different CMV generamo-tion meth-ods, (19) with clipper and (23), with respect to these four GMV sets. The settings of parameters in (19) with clipper and (23) are listed in Table III. The k is set as the same value for both hori-zontal and vertical directions means they have the same shaking absorption effects. The parameter β is the inner feedback-loop integral gain and it can determine the speed of the steady-state lag elimination during constant motion. The gain should not be too high to avoid resonance. In the in-car DIS applications, the constant motion is more frequently occurred in horizontal direc-tion than in vertical direcdirec-tion. Therefore, we set a higher gain for β in the horizontal direction to get a better visual quality.

In each subfigure, the dotted line, solid line, and dashed line indicate the original trajectory and compensated CMV trajectories by (23) and (19) with clipper, respectively. The GMV sets #1 and #2 [Fig. 15(a) and (b)] are estimated from video sequences with constant motion in images. The GMV set #3 [Fig. 15(c)] is estimated from VS#3. The GMV set #4 [Fig. 15(d)] is estimated from VS#4. According to the results, the compensated horizontal motion trajectories of GMV set #1, set #2 and set #4, which have more constant motion in images, generated by the proposed CMV generation method are closer to the original horizontal motion trajectories compare to the others. It means that the proposed method can reduce the steady-state lag and provides more space to absorb the shaking effect of im-age sequences without violating the physical range limitation. The GMV set #3 is estimated from the video captured in the highway. The result of the method with integrator has slightly overshooting phenomenon compared to the method without in-tegrator. This is the intrinsic property of adding integrator in the process loop. But it is a good tradeoff since it can greatly reduce the steady-state lag of motion trajectory. Table IV shows the SI comparisons corresponding to Fig. 15. The original SIs can be

regarded as SI of the original sequences with constant motion and undesired shaking components. In general, the proposed CMV generation method has better motion smoothing perfor-mance than the approach without integrator on the compensa-tion of most real video sequences with constant mocompensa-tion. The experimental results show that the proposed method can deal with various circumstances and has better performance in quan-titative evaluations (such as RMSE and SI), and human visual evaluation. Some original and compensated video sequences for visual assessment are available online at our web pages [21].

V. CONCLUSION

How to derive reliable GMVs from the video sequence cap-tured by in-car video cameras and how to derive appropriate CMVs to smoothen the shaking effect without reducing the ef-fective image area are two challenges for an in-car DIS system. In this paper, a robust in-car DIS technique is proposed to at-tack these two challenges. The adaptive background evaluation scheme based on the detected skyline can generate more reliable GMVs for image sequences with irregular conditions such as containing large low-contrast area (sky). For motion compen-sation process, the proposed CMV estimation method with an inner feedback-loop integrator can reduce the steady-state lags of motion trajectory without affecting the effective image size for image sequences with constant motion. According to the experimental results, the proposed technique demonstrates the remarkable performance in both quantitative and qualitative (hu-man vision) evaluations compared to the existing approaches. It can be implemented as software and hardware solutions for both online and offline video stabilization applications.

ACKNOWLEDGMENT

The authors would like to thank W.-H. Tsai for providing the valuable opinions and resources.

REFERENCES

[1] I. Masaki, “Machine-vision systems for intelligent transportation sys-tems,” IEEE, Intell. Syst., vol. 13, no. 6, pp. 24–31, Nov.–Dec 1998. [2] M. Oshima et al., “VHS camcorder with electronic image stabilizer,”

IEEE Trans. Consum. Electron., vol. 35, no. 4, pp. 749–758, Nov. 1989.

[3] K. Sato et al., “Control techniques for optical image stabilizing system,”

IEEE Trans. Consum. Electron., vol. 39, no. 3, pp. 461–466, Aug. 1993.

[4] S. J. Ko, S. H. Lee, and K. H. Lee, “Digital image stabilizing algorithms based on bit-plane matching,” IEEE Trans. Consum. Electron., vol. 44, no. 3, pp. 617–622, Aug. 1998.

[5] K. Uomori et al., “Automatic image stabilizing system by full-digital sig-nal processing,” IEEE Trans. Consum. Electron., vol. 36, no. 3, pp. 510– 519, Aug. 1990.

(14)

IEEE Trans. Consum. Electron., vol. 37, no. 3, pp. 521–530, Aug. 1991.

[9] S. W. Jeon et al., “Fast digital image stabilizer based on Gray-coded bit-plane matching,” IEEE Trans. Consum. Electron., vol. 45, no. 3, pp. 598–603, Aug. 1999.

[10] F. Vella et al., “Digital image stabilization by adaptive block motion vectors filtering,” IEEE Trans. Consum. Electron., vol. 48, no. 3, pp. 796– 801, Aug. 2002.

[11] S. Erturk, “Digital image stabilization with sub-image phase correlation based global motion estimation,” IEEE Trans. Consum. Electron., vol. 49, no. 4, pp. 1320–1325, Nov. 2003.

[12] J. Y. Chang et al., “Digital image translational and rotational motion sta-bilization using optical flow technique,” IEEE Trans. Consum. Electron., vol. 48, no. 1, pp. 108–115, Feb. 2002.

[13] J. S. Jin, Z. Zhu, and G. Xu, “A stable vision system for moving vehicles,”

IEEE Trans. Intell. Transport. Syst., vol. 1, no. 1, pp. 32–39, Mar. 2000.

[14] G. R. Chen et al., “A novel structure for digital image stabilizer,” in Proc.

2000 IEEE Asia-Pac. Conf. Circuits Syst., Tianjin, China, Dec. 2000,

pp. 101–104.

[15] Engelsberg and G. Schmidt, “A comparative review of digital image stabilising algorithms for mobile video communications,” IEEE Trans.

Consum. Electron., vol. 45, no. 3, pp. 591–597, Aug. 1999.

[16] M.B. van Leeuwen, “Motion estimation and interpretation for in-car sys-tems” Ph.D. dissertation, Informatics Inst., Univ. Amsterdam, Amsterdam, The Netherlands, May 2002.

[17] S. Erturk, “Image sequence stabilisation: Motion vector integration (MVI) versus frame position smoothing (FPS),” in Proc. 2nd Int. Symp. Image

Signal Process. Anal., 2001, pp. 266–271.

[18] M. K. Gullu and S. Erturk, “Fuzzy image sequence stabilization,”

Elec-tron. Lett., vol. 39, no. 16, pp. 1170–1172, Aug. 7, 2003.

[19] M. K. Gullu, E. Yaman, and S. Erturk, “Image sequence stabilization using fuzzy adaptive Kalman filtering,” Electron. Lett., vol. 39, no. 5, pp. 429–431, Mar. 6, 2003.

[20] L. Chen and N. Tokuda, “A general stability analysis on regional and national voting schemes against noise —-Why is an electoral college more stable than a direct popular election?,” Artif. Intell., 163, no. 1, pp. 47–66, 2005.

[21] A robust in-car digital image stabilization technique [Online]. Available http://falcon3.cn.nctu.edu.tw/˜liang/its_dis/its_dis.htm or http://www.ee. thit.edu.tw/∼kenhsu/its_dis/its_dis.htm

[22] F.-Y. Wang, “Agent-based control for networked traffic management sys-tems,” IEEE Intell. Syst., vol. 20, no. 5, pp. 92–96, Sep./Oct. 2005. [23] , “Agent-based control for fuzzy behavior programming in robotic

excavation,” IEEE Trans. Fuzzy Syst., vol. 12, no. 4, pp. 540–548, Aug. 2004.

[24] S. Erturk, “Translation, rotation and scale stabilisation of image se-quences,” Electron. Lett., vol. 39, no. 17, pp. 1245–1246, Aug. 21, 2003. [25] M. K. Gullu and S. Erturk, “Membership function adaptive fuzzy filter for image sequence stabilization,” IEEE Trans. Consum. Electron., vol. 50, no. 1, pp. 1–7, Feb. 2004.

Sheng-Che Hsu received the B.S. degree in electrical

engineering from Chung-Yuan Christian University, Chung-Li, Taiwan, R.O.C., in 1980, and the M.S. degree in electrical engineering from the New Jer-sey Institute of Technology, Newark, in 1989. He is currently working toward the Ph.D. degree at Chiao-Tung University, Hsinchu, Taiwan.

From 1982 to 1987, he was an Associate Re-searcher at the Industry Technology Research Insti-tute, and from 1989 to 1992, he was a Senior Engineer with Flow Asia Cooperation. Since 1992, he has been a Faculty Member in the Department of Electrical Engineering, Ta-Hwa Institute of Technology, Hsinchu. His current research interests include block matching, pattern recognition, and image processing.

an Assistant Professor. Currently, he is an Assistant Professor in the Department of Computer Science and Information Engineering, National Cheng-Kung University, Tainan. His current research interests include biomedical engineer-ing and biomedical signal/image processengineer-ing.

Kang-Wei Fan was born in Hsinchu, Taiwan, R.O.C.,

on June 1, 1976. He received the M.S. degree in computer science and information engineering from Chung Hua University, Hsinchu, in 2002. Currently, he is working toward the Ph.D. degree at the National Chiao-Tung University, Hsinchu. His research inter-ests include digital imaging, video processing, and color science.

Chin-Teng Lin (F’05) received the B.S. degree

from the National Chiao-Tung University (NCTU), Hsinchu, Taiwan, R.O.C., in 1986, and the Ph.D. de-gree in electrical engineering from Purdue University, West Lafayette, IN, in 1992.

He is currently the Chair Professor in the Department of Electrical and Computer Engineering, NCTU, the Dean of the Computer Science College, NCTU, and the Director of the Brain Research Cen-ter, NCTU. From 1998 to 2000, he was the Director of the Research and Development Office, NCTU. From 2000 to 2003, he was the Chairman of the Electrical and Control Engineering Department, NCTU. From 2003 to 2005, he was the Associate Dean of the College of Electrical Engineering and Computer Science. His current research interests include fuzzy neural networks, neural networks, fuzzy systems, cellular neural networks, neural engineering, algorithms, and very large scale integra-tion design for pattern recogniintegra-tion, intelligent control, multimedia (including image/video and speech/audio) signal processing, and intelligent transportation systems. He is the coauthor of the book, Neural Fuzzy Systems—A Neuro-Fuzzy

Synergism to Intelligent Systems (Prentice-Hall) and the author of Neural Fuzzy Control Systems with Structure and Parameter Learning (World Scientific). He

has published over 90 journal papers in the areas of neural networks, fuzzy systems, multimedia hardware/software, and soft computing, including about 60 IEEE journal papers.

Dr. Lin served on the Board of Governors for the IEEE Circuits and Sys-tems (CAS) Society in 2005 and the IEEE SysSys-tems, Man, Cybernetics (SMC) Society during 2003–2005. He is the Distinguished Lecturer of the IEEE CAS Society from 2003 to 2005. He is the International Liaison for the International Symposium of Circuits and Systems (ISCAS) 2005, Japan, the Special Session Co-Chair for the ISCAS 2006, Greece, and the Program Co-Chair for the IEEE International Conference on SMC 2006, Taiwan. Since 2004, he has been the President of Asia Pacific Neural Network Assembly. He is the recipient of the Outstanding Research Award granted by National Science Council, Taiwan. Since 1997, he has been the recipient of the Outstanding Electrical Engineering Professor Award of the Chinese Institute of Electrical Engineering (CIEE). In 2000, he was the recipient of the Outstanding Engineering Professor Award of the Chinese Institute of Engineering (CIE) and the 2002 Taiwan Outstanding Information-Technology Expert Award. In 2000, he was also elected as one of the 38 Ten Outstanding Rising Stars in Taiwan. Currently, he serves as Asso-ciate Editors of IEEE Transactions on Circuits and Systems, Part I & Part II, IEEE Transactions on Systems, Man, Cybernetics, IEEE Transactions on Fuzzy Systems, and International Journal of Speech Technology. He is a member of Tau Beta Pi, Eta Kappa Nu, and Phi Kappa Phi honorary societies.