Thesis Organization

CHAPTER 1 INTRODUCTION

1.3 Thesis Organization

This thesis is organized as follows. Chapter 2 describes the models of the proposed image stabilization algorithm and gives a computational analysis to the DIS system. Chapter 3 introduces CNN algorithm, the hardware design of the CNN core and templates design for the CNN-based local motion estimator. Chapter 4 describes the circuit design of application specific CNN (ASCNN) chip. The simulation results with HSPICE, CNN universal machine (CNNUM), and ModelSim respectively are shown in Chapter 5. Conclusions and future works are made in Chapter 6.

2. CHAPTER 2

DIGITAL IMAGE STABILIZATION

The architecture of the proposed image stabilizer technique shown in Fig. 1 is divided into two processing blocks as motion estimation and motion compensation. The motion estimation block consists of three estimators: the local motion vectors (LMVs), the ill-conditioned motion vector (IMV), and the global motion vector (GMV) estimators. The motion compensation unit consists of the compensating motion vector (CMV) estimation and image compensation. The two incoming consecutive images (at time (t-1) and time (t)) will be firstly divided into four regions.

A LMV will be derived in each region by the representative point matching (RPM) algorithm [12], [19]. The motion estimation block also contains a reliability detection function that will generate an ill-conditioned motion vector for the irregular image conditions such as the lack of features or containing large low-contrast area, etc. The GMV estimation determines a global motion vector among LMVs, the IMV, and other pre-selected motion vectors through background-based evaluation function. Finally, the compensating CMV is generated according to the resultant GMV and the image sequences will be compensated based on the CMV in the motion compensation unit.

The proposed digital image stabilizer system contains motion estimation step and motion compensation step. The design blocks are described as follows.

Fig. 1 The architecture of the proposed image stabilization technique.

2.1 Motion Estimation

The motion estimation unit shown in Fig. 1 contains the LMVs, IMV, and GMV estimators.

As shown in Fig. 2, the LMVs and IMV estimation is to generate the LMVs and IMV for global

motion vector estimation. The LMVs can be obtained from the correlation between two consecutive images by the representative point matching (RPM) algorithm [12], [19]. The IMV can be obtained from LMVs by evaluating the corresponding confidence indices through the irregular condition detection and the proposed IMV generation algorithm.

Fig. 2 The block diagram of LMVs and IMV estimation.

2.1.1 Local Motion Vector (LMV) and Irregular Condition Detection

First, we obtain location motion vectors by using the representative point matching method.

A 19×25 pixels’ macro-block is the basic processing unit of the algorithm. For a given sequence of video image, the specific boundary region of the incoming frame will be discarded first which saves as the compensating area. Then the pre-processing frame will be separated into four regions which will generate a local motion vector after the later steps. Then each region will further cut into many 19×25 pixels sub-regions. We take the center point color of the sub-region image as our representative point value. Each sub-region image has its own representative point value. The previous representing point value is subtracted from the present sub-region image and the absolute value is taken. The minimum value position in the 19×25 absolute differences matrix is considered as the previous representative moving point due to vibration noise. Summing all of the absolute differences (SAD) matrices and give a statistical analysis on them. The vector which calculates form the center point to the minimum SAD value position is considered as the local motion vector of each region.

Our testing image sequence is a tennis player video clip which is 312×200 pixels for each incoming frame. The steps and the results of the algorithm are listed below.

Representative Point Matching

z Segment the prescribed sequence region (t-1) and (t) into sub-images which each of them is 19×25 pixels as shown in Fig. 3(a).

z Map all the pixels with the central point in each sub-image (t-1). The mapping array is called representative point macro-block (RPM) as shown in Fig. 3 (b).

Subtraction: The operation is defined by |Msub(t)| := Sub-image(t)-Representative_point

image (t-1) to provide absolute difference for the Msub(t) matrix (19×25 pixels).

Addition: Add all the |Msub(t)| in the prescribed region to form an 19×25 difference

value matrix as shown in Fig. 4. Fig. 4 is the SAD matrix that map into a 3D view. The z-coordinate is the absolute difference value. The lower of the SAD value, the closer the LMV is.

Minimum: Find the minimum absolute difference position from the prescribed region

and calculate the vector from the center point. The vector is called the local motion vector of the prescribed region. After analyzing the Fig. 4, we can find that the minimum position lies in the left and top place in the array and we will discuss it more in Chapter 5.

(a) (b) Fig. 3 (a) The original image is divided into 4 regions, and then each region is cut into an array

of 19×25 pixels again. (b) All the pixels are mapped with the central point in each sub-image (t-1).

Region 1

Region 2 Region 4

Region 3

Fig. 4 The accumulated results of all | Msub(t) | in the prescribed region form an 19×25 difference value array.

After analyzing the curves of correlation values corresponding to image sequences with various conditions, it is found that the curve of correlation values is related to the reliability of motion detection. Fig.5 shows various correlation curves corresponding to image sequences with different conditions. Fig. 5(a) shows a normal case that each region of the incoming frame has its own obvious minimum position, and we can distinguish them from x and y coordinates. Fig. 5(b) shows a valley shape distribution seeing from y coordinate and each region has its own obvious minimum position. But every x coordinate is not reliable due to lack of clear critical point.

(a)

(b)

Fig. 5 Various correlation curves corresponding to image sequences with different conditions.

(a) Normal case that each region of the incoming frame has its own obvious minimum position.

(b) Case of only y coordinate has valley shape distribution.

The curve of correlation values is related to the reliability of motion detection, so we propose a strategy that combines the minimum projections of correlation curve in x and y directions (minimum projections) and the inverse triangle method to detect the irregular conditions from each region to reduce computation complexity. The mathematical expression of minimum projections can be written as

) , ( min ) min(

p R p q

x

i = q

y

_min(

q

) min

R

_i(

p

q

i = p

(2.1) wherex_i_ min( )p andy_i_ min( )p are the minimum projections of correlation curve in x and y directions in region i , respectively. The concept derives from the intuitional sense that the high reliable curve for determining the LMV has a sharp and obvious peak, and no other equivalent peaks appeared in the same curve.

Figure 6 is the projection of x and y correlation curve of each region from Fig. 5. We can see that in Fig. 6(a) each region has obvious minimum x and y position. There are many local minimum points in the x projection in Fig. 6 (b) and only y coordinate can distinguish the minimum position.

-20 0 20

Fig. 6 Examples of minimum projections of correlation curve from x and y directions in four regions: (a) regular image sequence and (b) ill-conditioned image sequence.

In order to judge the reliability of the motion vector from Fig. 6 , we combine the inverse triangle method with the minimum projections of correlation curve to find the reliability indices.

While the local minimum difference distance is larger than a specific value, the minimum position is not reliable.

Fig. 7 Illustration of the proposed inverse triangle method.

Base on this criterion, the algorithm is designed as follows. In the first step, we find T_i_ min

that represents the global minimum of the minimum projection curve in region i and can be calculated by Eq. (2.2). In the second step, we calculate S_xi and S_yi by Eq. (2.3), where offset

is the altitude of the inverse triangle, n_xi and n_yi are defined as the numbers of S_xi and S_yi, respectively Eq. (2.4), d_xiand d_yiare defined as the distances of two vertexes of the base of inverse triangle obtained by Eq. (2.5). The confidence level of x and ydirections are calculated by Eq. (2.6). Since the condition of multiple peaks seriously degrades and affects the determination of reliability, the penalty of multiple peaks is taken into account by Eq. (2.6) to improve the discrimination of reliability. The example shown in Fig. 7 is a curve with twin peaks which will get the penalty of d_xi−n_xi. In the third step, we determine the confidence indices of x_i and y_i

in region i through a threshold denoted as TH . The smaller value of confidence level represents the higher reliability. In the final step, summing up the counts of reliable motion components of x and y in four regions as Eq. (2.7), we get Num x( )_i and Num y( ), 1 ~ 4_i i= .

The follows describe the procedure:

Step 1.

Find global minimumT_i_ minfromx_i_ min( )p ory_i_ min( )q .

_ min min( _ min( ))

Set the threshold,TH, for determining the reliability indices.

If x_i_conf < TH Then x_i is reliable, Elsex_iis unreliable,

End if.

If y_i_conf < TH Then y_i is reliable, Elsey_iis unreliable,

End if.

Step 4.

Calculate the numbers of

x and

y in four regions.

( ) sum of ( is reliable)

2.1.2 Irregular Motion Vector (IMV)

Irregular motion vectors can be detected and excluded by using minimum projection and inverse triangle method; however, image sequence with ill-condition such as lack of feature, large low-contrast area, moving object or repeated pattern, may contain fewer available MVs (most of the MVs are irregular) in four regions. Therefore, recombination of these available regular MVs is necessary to form an ill-conditioned motion vector (IMV). To solve this problem, a median function is used to extract a motion vector with respect to each direction for ill condition.

The calculation to determine the IMV is described as follows in details.

Case 1. If Num x t( ( ))_i =4 then

Then we apply the similar process to obtain V_ill_{_}_y( )t . The resultant IMV is represented by

2.1.3 Global Motion Vector

The LMV in each region may represent global motion vector, moving object motion vector, or even error vector. The error vector may cause by the ill condition or the mixture of global motion and moving object motion. Although the reliable global motion vector is essentially selected from LMVs and IMV, however, in the worst case, i.e. estimations of LMVs and IMV are all fault due to high noise image sequence, it will induce artificial shaking result due to adopt an error GMV. Therefore, if the evaluation includes the zero motion vector (ZMV), it can prevent the occurrence of this case. Similarly, for a high noise image sequence with panning, the last previous GMV will be the best choice if the estimations of LMVs and IMV are all fault. In the proposed IS technique, the seven motion vectors including four LMVs, the IMV, the ZMV, and the last previous GMV, referred as pre-selected motion vectors (pre_MV), are employed to estimate the GMV of the current frame. In general, one of LMVs is the highly probable GMV for the regular image; the IMV is the highly probable GMV for ill-conditioned image; the ZMV can prevent worse compensation result caused by the fault MVs; and the last previous GMV is useful for panning condition. In this paper, a background-based evaluation function is proposed to overcome this problem. Fig. 8 shows the areas for background-based evaluation. Five regions are selected to evaluate the result, which are located on the surroundings of the image. The reason is that, in most cases, the foreground object is located on the center of the image, so the surroundings of the image are the best candidates for background detection.

Fig. 8 Areas for background detection and evaluation

The estimation of the GMV is calculated by the summation of absolute difference (SAD) [33],

the higher probability of the desired motion vector among theses pre-selected motion vectors.

Five-region peer-to-peer evaluation can prevent the situation that some partial high-contrast image regions dominate the evaluation result. In this algorithm, each region has an equal priority to determine the result. The pre MV_ _c with the smallest S_c is the desired GMV and it can be important index to determine the GMV.

2.2 Motion Compensation

It is necessary to generate the compensating motion vectors (CMVs) for removing the undesired shaking motion while keeping the steady motion of the image sequence. The conventional compensating motion vector estimation was given by [18]

CMV t

( )=

k CMV t

( ( − +1)) (

α GMV t

( ) (1+ −

α

)

GMV t

( −1)), (2.13) where t represents the frame number, 0< <k 1 and 0≤ ≤α 1. In the case, there is the tremendous lag condition due to the steady panning effect. It will reduce the available effective image area. The CMVs are generated by Eq. (2.13) with the clipper function [34] as

(

⁽ ⁾ ⁽ ⁾

)

where l is boundary limitation, i.e. maximum window shift allowance. The lag can be reduced to a certain range; however it will also decrease the performance of shaking compensation due to the picking window operating near the boundary area. To attack this drawback, we combine the inner feedback-loop integrator with clipper function to reduce the steady-state lag for steady motion as well as to keep the CMV to operate in the appropriated range. Fig. 9 shows the block diagram of the proposed CMV generation method. There is an integrator in the inner feedback loop, which can eliminate the steady-state lag of the CMV in panning condition. That means, by employing the integrator, shaking components of images with constant panning as well as those in regular images can be stabilized. The proposed CMV computation procedure is presented by

)

Fig. 9 The block diagram of the proposed CMV generation method.

2.3 Implementation of Local Motion Estimation

After programming the DIS system, we find that it can do an excellent off-line job comparing with the RPM fuzzy set theory [32]. But it is hard to implement the algorithm into practical consumer camcorders. Although there are still several DIS systems which could do the on-line job by using PC [45], it is still a long way to go into the DIS hardware design.

Therefore it is necessary to analyze the DIS computation load before the system implement on practical camcorders. We record the computation time from each DIS step and compare them

might lack of accuracy due to different test videos, but we still can recognize that most of the computation loading belongs to the motion estimation step. Fig. 10 shows the percentage of the computation loading of each block in DIS system. The motion estimation step contains 80 % processing time and the motion compensation contains the rest 20%. And more than halve computation time belongs to local motion estimation step. This is because the RPM method in LMV step need to load the image SAD values into memories first and find the minimum value pixel by pixel. The tasks of finding global minimum position by using a DSP processor is comparing and storing the minimum value of neighbor pixel and jump to next pixel. The minimum position information should also store in the memory. The larger the processing unit is, the longer processing time it takes. Our strategy is to design an application specific chip which could highly reduce the computation time in LMV estimation. The less the processing time in LMV estimation step the higher capability of real-time operations for image stabilization processing is.

The LMV estimation chip is designed with CNN technology to solve the heavy computation time problem. Compared with conventional digital technology, CNN-based computing is capable of realizing these TeraOPS-range image processing tasks in a cost-effective implementation. The design concept of ASCNN chip is shown in Fig. 11. By using an 8-bit D/A converter, the absolute image difference which ranges from 0~256 could store into the CNN local analog memories (LAM). And with the aid of CNN technology, the LMV position could be easily found compared with the DSP processor. The CNN theory and chip design are introduced in Chapter 3 and Chapter 4.

Fig. 10 Computational complexity of the DIS flow.

Fig. 11 The design concept of ASCNN chip architecture.

Compensated Images

LMVs Estimation

Motion Estimation ~80%

Global Motion Vector Estimation

Motion Compensation ~20%

Compensation Motion Vector Estimation

LMVs

Image

Compensation ^CMV

~63% ~17%

Original Images

Store in Buffer

GMV

3. CHAPTER 3

Cellular Neural Network

The original Cellular Neural/Nonlinear Networks (CNN) paradigm was first introduced by Chua and Yang [20]. CNN technology is both a revolutionary and experimentally proven new computing paradigm. The two of most fundamental ingredients of the CNN are: the use of analog processing cells with continuous signal values, and local interaction within a finite radius. CNN possesses some of the key features of neural network, which has important potential applications in such areas as image processing and pattern recognition. The CNN theory and architecture will be introduced first and the next is CNN circuit. The last will include the inversion and adaptive threshold properties of CNN which are used to calculate the LMV.

3.1 CNN Theory

CNN can be considered an implementable alternative to fully connected neural networks and a remarkable improvement in hardware implementation of artificial Neural Networks. Local interconnection and simple synaptic operators are the most attractive features of the CNN for VLSI implementation in high-speed, real-time applications [35] and the CNN are widely used in several application fields, such as image processing and pattern recognition. Several hardware implementations of the CNN have been reported in the literatures [37], [38], [39].

The state equation of CNN can be represented by

( ) ( ) ( ) ( )

point in the neighborhood within a radius r of the cell i, j. A and B are the nonlinear cloning templates [40]. Fig. 12 shows the dynamic route of state in CNN. The feature of the Eq. (3.2) has been plotted at Fig. 13.

In many applications, the templates (A,B) and the threshold I are translation invariant. In the case of single variable A and B functions, the linear (space-invariant) template is represented by the additive terms as Eq. (3.1).When the template is space invariant, each cell is described by a simple identical cloning template defined by two (2r + 1) × (2r + 1) real matrices A and B, as well as the constant term I. In addition, as a very special case, if the input and the initial state values are sufficiently small and f is piecewise linear, then the dynamics of the CNN array is linear.

Unlike other standard analog processing arrays, or neural networks, the one-to-one geometric (topographic) correspondence between the processing elements and the processed signal-array elements (e.g., pixels) is of crucial importance. Moreover, the template has geometrical meanings which can be exploited to provide with geometric insights and simpler design methods.

Fig. 12 The dynamic route of state in CNN.

3.2 CNN Architecture

The basic circuit unit of CNN is called a cell. It contains linear and nonlinear circuit elements, which typically are linear capacitors, linear resistors, linear and nonlinear controlled sources, and independent sources. The structure of CNN is similar to that found in cellular automata, and each cell in a CNN is connected only to its neighbor cells. Adjacent cells can interact directly with each other. Cells not directly connected together may affect each other indirectly because of the propagation effects of the continuous-time dynamics of the network. A typical example of a cell C(i, j) is shown in Fig. 14, where the suffixes u, x, and y denote the input, state, and output, respectively.

Fig. 14 The circuit of a CNN cell.

The differential equation governing a CNN in Eq. (3.1) is rewritten as follow :

⎥⎥ cloning template. The synapse weights of the shift-invariant CNN can be described by the feedback and feed forward cloning templates:

⎥⎥ The matrix B can be defined in the similar way. Then, the maximum value of x in the steady state is the sum of absolute values of all inputs from the neighborhood cells,

∑

3.3 CNN Circuit Design

The current-mode approach [40], [42] is used in CNN circuit design because it has superior mathematical addition properties. The summation of weighted currents is simply done by appropriate transistor sizing. The piecewise-linear function is achieved by cascading two current limiters as shown in Fig. 15.

(a) (b)

Fig. 15 Piecewise linear function. (a) Schematic view and (b) Transfer characteristics of two current limiters in the cascade [36].

The limiting operation of the input current denoted by Ix first takes place at a negative value

在文檔中使用細胞神經網路架構實現影像穩定處理之震動向量估測晶片 (頁 15-0)