國 立 交 通 大 學
電機與控制工程學系
碩士論文
使用細胞神經網路架構實現影像穩定
處理之震動向量估測晶片
Chip Design of CNN-Based Local Motion Estimation for
Image Stabilization Processing
研究生:程盈彰
指導教授:林進燈
博士
使用細胞神經網路架構實現影像穩定
處理之震動向量估測晶片
Chip Design of CNN-Based Local Motion Estimation for
Image Stabilization Processing
研 究 生:程盈彰
Student:Ying-chang Cheng
指導教授:林進燈 博士
Advisor:Dr. Chin-Teng Lin
國立交通大學
電機與控制工程學系
碩士論文
A Thesis
Submitted to Department of Electrical and Control Engineering
College of Engineering and Computer Science
National Chiao Tung University
in Partial Fulfillment of the Requirements
for the Degree of Master
in
Electrical and Control Engineering
June 2005
Hsinchu, Taiwan, Republic of China
使用細胞神經網路架構實現影像穩定
處理之震動向量估測晶片
學生:程盈彰
指導教授:林進燈 博士
國立交通大學電機與控制工程研究所
中文摘要
利用硬體估測攝影裝置於拍攝時,因手晃動或是機體支架震動所產生之不預期背景震 動向量,以提供影像穩定系統(Image Stabilization)補償為本論文的主要貢獻。影像穩定系統 包含背景震動向量估測單元及震動向量之補償單元兩部分,然而計算背景震動向量佔去整 個影像穩定系統大部分的處理時間,因此本論文提出一特殊應用導向之晶片(ASIC),處理 拍攝影像背景震動向量的估測。晶片設計採用仿細胞神經網路(Cellular Neural Network, CNN) 架構用以偵測背景震動向量的特徵;仿細胞神經網路之基本架構為一個和週圍細胞 (cell)相連並規則排列之二維陣列,具有即時平行運算處理的能力。本論文提出使用 CNN 可調節偏壓的設計,快速計算影像穩定系統中所需要的背景震動向量,並配合全區域輸出 連結鍊(global output connect chains)之設計,將區域背景震動向量位置定址出來。此外,區 域類比記憶體(Local Analog Memory)陣列的設計來儲存影像差值資訊。本論文所設計之背 景震動向量估測晶片採用 TSMC 0.35um 混合訊號製程,其大小為 8.1mm2共包含19×25個
像素處理單元。電路模擬及 CNNUM 分析結果證實採用以 CNN 實現 IS 系統中估測背景震 動向量具有遠高於 DSP 處理器的運算速度,讓整體 IS 系統具有即時運算之能力。
Chip Design of CNN-Based Local Motion Estimation
for Image Stabilization Processing
Student:Ying-chang,
Cheng
Advisor:Dr. Chin-Teng Lin
Department of Electrical and Control Engineering
National Chiao Tung University
Abstract
The objective of this thesis is to investigate the hardware design in image stabilization (IS) technique for local motion vectors (LMVs) in the image sequences. The IS technique is used to remove unwanted shaking phenomena in the image sequences captured by hand-held camcorders without affecting moving objects in image sequences and the intentional motion of panning condition, etc. It consists of motion estimation and motion compensation. Most of the complex and time consuming computations occur in motion estimation, an application-specific IC is designed to solve this problem. Cellular Neural Network (CNN) technology is used to implement the local motion estimation chip. CNN is a regular two-dimensional array and connects with its neighborhood locally. Real-time and parallel analog computing elements are contained in the architecture. CNN adaptive threshold template is proposed to extract reliable motion vectors from a given region. The design of global output connected chains can easily decode the LMV address. The local analog memory (LAM) is designed to store image difference information. The size of CNN array is 19×25 pixels. The chip has integrated in the total area of 8.1mm2 by using TSMC 0.35um mixed-signal process. Results with HSPICE simulation and CNNUM analysis prove that the performance of the proposed CNN-based local motion estimation is better than that of a digital signal processorsothat the IS system has the capability of real-time operations.
誌謝
在電控所兩年的求學生涯中,首先要感謝指導教授林進燈老師這兩年來不僅在 學業方面的悉心指導,讓我學習到許多寶貴的知識,在學業及研究方法上也受益良 多;更在爲人處世及求學態度上給予啟蒙及悉心指導,使得本論文能順利完成。另 外也要感謝口試委員們的的建議與指教,使得本論文更為完整。 其次,感謝曾提供我相當多協助的學長鶴章、仁峰、朝暉、世安、立偉、匈牙 利來的 Guiri、資訊媒體實驗室的同學宗恆、俊永、弘昕、Linda、Martin、育緯、 晴慧,資訊媒體實驗室的學弟家昇、經翔、谷谷、Jack,618 的重甫、307 的文芩, 以及實驗高中好友們孫民、吉哥、鈞哥、剛哥、以信、小 P 在研究過程中所給我的 鼓勵與協助;其實該感謝的人很多,無法一一道盡,只有將感激收藏在心裡。 在交大的這兩年中曾與我在校園中度過的師友們,因有你們的參與使得我在研 究所的求學階段更加多采多姿,留下許多美好的回憶。 最後,更要感謝我的父母親對我的教育與栽培,給予我一切鼓勵與協助,使我能 安心無負擔的完成碩士學業。僅將本論文獻給我的家人所有關心我的師長與朋友 們,願他們共享這份榮耀。Contents
摘要……….….ii Abstract………...…………....iii 誌謝………...………. iv Contents………...…...v List of Tables……….vii List of Figures………...………...viii CHAPTER 1 INTRODUCTION... 1 1.1 Motivation ... 1 1.2 Objective ... 3 1.3 Thesis Organization... 4CHAPTER 2 DIGITAL IMAGE STABILIZATION ... 5
2.1 Motion Estimation... 5
2.1.1 Local Motion Vector (LMV) and Irregular Condition Detection ... 6
2.1.2 Irregular Motion Vector (IMV) ... 13
2.1.3 Global Motion Vector ... 14
2.2 Motion Compensation ... 15
2.3 Implementation of Local Motion Estimation ... 16
CHAPTER 3 CELLULAR NEURAL NETWORK... 19
3.1 CNN Theory ... 19 3.2 CNN Architecture ... 21 3.3 CNN Circuit Design ... 22 3.4 CNN Template Consideration ... 24 3.4.1 Image Difference ... 24 3.4.2 Global Minimum... 25
CHAPTER 4 CIRCUIT DESIGN OF CNN-BASED LMV ESTIMATION... 29
4.1 D/A Converter ... 31
4.2 Local Analog Memory ... 36
4.3 Voltage to Current Converter ... 38
4.4 CNN Core and Template (A,B) Design... 40
4.5 Adaptive Bias Circuit ... 41
4.6 ASCNN Processing Unit ... 44
4.7 Global Output Connected Chain ... 48
4.8 ASCNN Controller ... 48
CHAPTER 5 SIMULATION RESULTS... 51
5.1 Simulation Results of CNNUM... 51
5.2.1 Design flow... 55
5.2.2 Simulation Results ... 56
5.2.3 Chip Specification and Layout... 62
5.2.4 Testing consideration ... 63
CHAPTER 6 CONCLUSIONS AND FUTURE WORKS ... 66
REFERENCES ... 68
APPENDIX ... 71
APPENDIX A Chip Information ………...71
List of Tables
Table 1 : CNN Adaptive Threshold Template Analysis ...26 Table 2 : 8-bit D/A Converter Specification Table. ...33 Table 3 : Specification of Unit Gain Buffer. ...44
List of Figures
Fig. 1 The architecture of the proposed image stabilization technique. ...5
Fig. 2 The block diagram of LMVs and IMV estimation...6
Fig. 3 (a) The original image is divided into 4 regions, and then each region is cut into an array of 19×25 pixels again. (b) All the pixels are mapped with the central point in each sub-image (t-1). ...7
Fig. 4 The accumulated results of all | Msub(t) | in the prescribed region form an 19×25 difference value array. ...8
Fig. 5 Various correlation curves corresponding to image sequences with different conditions. (a) Normal case that each region of the incoming frame has its own obvious minimum position. (b) Case of only y coordinate has valley shape distribution...9
Fig. 6 Examples of minimum projections of correlation curve from x and y directions in four regions: (a) regular image sequence and (b) ill-conditioned image sequence. ...10
Fig. 7 Illustration of the proposed inverse triangle method...11
Fig. 8 Areas for background detection and evaluation...14
Fig. 9 The block diagram of the proposed CMV generation method...16
Fig. 10 Computational complexity of the DIS flow. ...18
Fig. 11 The design concept of ASCNN chip architecture. ...18
Fig. 12 The dynamic route of state in CNN. ...20
Fig. 13 The output of sigmoid function...20
Fig. 14 The circuit of a CNN cell. ...21
Fig. 15 Piecewise linear function. (a) Schematic view and (b) Transfer characteristics of two current limiters in the cascade ...23
Fig. 16 The CNN cell with fixed weights (templates)...24
Fig. 17 Simulation of CNN inverse template. (a) Input of gray-scale image. (b)The initial state of CNN. (c) Output after difference processing...25
Fig. 18 Searching global minimum by using CNN adaptive threshold template. ...28
Fig. 22 8-bit current mode D/A converter. ...32
Fig. 23 The output voltage variation due to input change of (8-bit) 256 steps. ...33
Fig. 24 DNL analysis...34
Fig. 25 SNR analysis of 8-bit DAC. (a) sine wave output wave form of DAC; (b) DFT analysis of DAC output form with 140 sampling points; (c) The logarithm value for the x- coordinate in (b). ...35
Fig. 26 Layout of 8-bit current mode D/A converter ...36
Fig. 27 The single structure cell LAM. ...36
Fig. 28 The characteristic curve of MOS capacitor...37
Fig. 29 Layout of MOS capacitor...37
Fig. 30 Schematic of voltage to current converter. ...38
Fig. 31 Layout of voltage current converter. ...38
Fig. 32 VCC voltage to current transfer curve. ...39
Fig. 33 Schematic of the CNN cell...40
Fig. 34 Layout view of the CNN cell. ...41
Fig. 35 5-bit cascade current mirror using a source follower level shifter...42
Fig. 36 Schematic of the unit gain buffer ...43
Fig. 37 Layout of the unit gain buffer. ... 43
Fig. 38 Layout of 5-bit cascade current mirror using a source follower level shifter. ...43
Fig. 39 Components of ASCNN processing unit. ...45
Fig. 40 Schematic of an ASCNN processing unit (inside the rectangle) with CNN bias circuit. ...46
Fig. 41 Symmetric layout of ASCNN processing unit. (a) Odd column processing unit layout. (b) even column processing layout. ...46
Fig. 42 (a) Layout of ASCNN array (part view). (b) Connections of CNN cells. ...47
Fig. 43 Structure of global output connect chain (3 x 3 array)...48
Fig. 44 Loading SAD information into each LAM location. ...50
Fig. 45 Complete FSM for the ASCNN controller design. ...50
Fig. 46 Inversion template. ...51
Fig. 47 First addition template...52
Fig. 49 Chip sets of CNNUM...52 Fig. 50 Adaptive threshold template...53 Fig. 51 (a) The output of 19×25 difference value array image. (b) The output of (a) is fed into
CNNUM using adaptive minimized threshold template with Eq. (7). (c) Vertical output of (b) with eq. (5-1). (d) Horizontal output of (b) with eq. (5-2). ...54 Fig. 52 ASCNN IC design flow. ...55 Fig.53 (a) Pre-simulation and post-simulation results of the difference values which were
stored in the LAM. (b) Post-simulation of the difference values which were stored in the corresponding rectangular LAM. 0.6 volt pre-charge for each LAM cell is needed to pass the nonlinear region of the MOS capacitor...58 Fig. 54 32 levels CNN bias current which ranges from 0uA to 20.15uA. ...59 Fig. 55 Simulation of CNN output inside the dot line. (a) The output of 19×25 difference value
array image.(b) (c) row and column global connect chains output corresponding to minimum SAD position. (d) global connect chains for the rest of 16 outputs...59 Fig. 56 Simulation results of the CNN control circuit.(a) the initial state. (b) the loading state. (c)
the CNN active state. (d) the global minimum position by raising VB. ...61 Fig. 57 The overall simulation for the 19 times 25 array. (a) 25 columns of global output
connect chains. (b) 19 rows of global output connect chains. ...61 Fig. 58 Layout of mixed-signal ASCNN chip. (a) Topology of ASCNN chip. (b) ASCNN chip
layout with 48 bonding pads. ...62 Fig. 59 ATPG reports fromTetra-max...64 Fig. 60 Digital and analog part testing consideration of ASCNN chip. ...65
1.
CHAPTER 1
INTRODUCTION
Video image stabilizers that compensate for camcorder shaking have already become indispensable for consumer video camcorders [1]. Image stabilization (IS) is also known as vibration reduction, which is a digital camcorder technology that helps preventing images from blurring. It reduces vibration caused by camcorder shake, slow shutter speed or when using a long telephoto lens without a tripod. For developed video camcorders, image stabilization is finding a way into more consumer and professional digital camcorders. Various image-stabilizing systems have been developed for camcorders to free from degradation in picture quality by hand-movement. In the IS processing, scene points are motionless in spite of camcorder motion. This makes it easier for an operator to select a region, for example. The unwanted positional fluctuations of the video sequence will affect the visual quality and impede the subsequent processes for various applications such as motion coding, video compression, feature tracking, etc.
1.1 Motivation
Basically, the IS technique can be classified into two processing methods. One is the optical image stabilization (OIS), and the other is the electronic image stabilization (EIS). Optical image stabilization (see APPENDIX B) uses mechanical motion compensation to physically move the lens, and hence the image that falls on the image sensor, in the opposite direction from the camcorder shake. Camcorder makers offering optical stabilizers include Sony, Panasonic and Canon corp. [2], but this feature is generally reserved for high-end models. Optical image stabilization for consumer’s video cameras has been proposed by Holder [4] and Oshima [5]. Both systems are similar in the sense that they produce angular velocity by using gyro control sensors, but they differ in methods for compensating the angular velocity [6]. One common disadvantage of Holder’s method and Oshima’s system is that they are using mechanical parts such as gyros and they control deflection coils (Holder) or a gimbal mechanism (Oshima) for
motion compensation. The mechanical parts of the IS system result in higher cost, larger space, and heavier.
Electronic image stabilizer, so-called digital image stabilizers (DIS), takes the property of image sensors with more pixels than the video image required and does the digital image processing. The video image is like a “window” that moves around within the larger frame of the image sensor. When camcorder shake moves the image up, EIS moves this “video window” down to compensate. Many DIS algorithms have been proposed. Chang et al. [7] use optical flow to remove the translational and rotational motion disturbance. The optical flow technique is used to estimate the local motion vector field of the image and yield the velocity of each pixel in the current image frame. Ko [8] propose a gray-coded bit-plan DIS algorithm to estimate the irregular condition motion vector due to moving objects and intentional panning. ITRI [9] has developed a DIS prototype system with FPGA and DSP implementation. The system composes of software and hardware blocks to utilize the gray-coded bit-plan matching algorithm for the video sequences. The DIS technique has been widely used for the computation of ego-motion [10], and video compression [11].
DIS consists of the motion estimating system and the motion compensation system. The motion estimation based on block matching algorithm (BMA) plays an important role in DIS [13]-[16]. The full-search (FS) BMA under the mean absolute difference (MAD) and the mean square error (MSE) criteria can be considered as an optimal solution for block motion estimation [8]. For the motion compensation, the accumulated motion vector estimation [18] and frame position smoothing (FPS) [28]-[31] are two of the most popular approaches. The accumulated motion vector estimation needs to compromise stabilization and intentional panning preservation since the panning condition causes a steady-state lag in the motion trajectory[28]-[31]. FPS accomplishes the smooth reconstruction of an actual long-term camera motion by filtering out jitter components based on the concept of designing the filter with appropriated cut-off frequency [28] or adaptive fuzzy filter to continuously improve stabilization performance [31].
The full-search block matching algorithm requires complicated computation which is time consuming, and hardware implementation. Several computationally efficient DIS algorithms,
objective of these algorithms is to reduce the computational complexity, in comparison with full-search block-matching method, without losing too much accuracy and reliability. However, RPM still costs many computation cycles in comparing with the global minimum position, and EPM requires large amount of computations due to preprocessing for generating edge maps [8].
After analyzing the DIS algorithm, the motion estimation part has accounted for 80% of the computational complexity. The last 20% computation load belongs to motion compensation [32]. In contrast to the heavy arithmetic computational load from RPM (which is used in the proposed motion estimation), the motion compensation simply does some decisions on LMVs and then the calculation of the compensated motion vector. Hence we focus on a cost effective hardware design for the motion estimation.
1.2 Objective
Most of the DIS systems are processed by PC or FPGA implementation. The chip implantation for consumer camcorders of the IS system is only for gyro control sensor in optical stabilization system. We aim at designing a novel architecture for local motion estimation of the DIS system with VLSI approach. Due to the complex computation found in motion estimation, the DIS system is hard to perform in real time. In order to further reduce the computational complexity in finding LMVs, the mixed-signal cellular neural network (CNN) architecture is considered [20]. Comparing with conventional digital technology, cellular neural/nonlinear network (CNN)-based computing is capable of realizing the trillions of operations per second (TeraOPS)-range image processing tasks in a cost-effective implementation [21]. CNNs are an analog nonlinear dynamic processor arrays in which direct inter-connections among the basic processing units are restricted to a finite local neighborhood [22]. By changing the weight of local interconnections between neighborhood CNN cells, many image processing tasks can be realized with CNN framework. Because of their inherently parallel processing architecture, CNN can achieve a high speed computation while realizing the image processing tasks. In spite of that, their uniformity and local connectivity make them especially suited for VLSI implementation [23][24][25].
In this thesis, we propose an application-specific CNN (ASCNN) chip which could highly reduce the heavy computation problem for motion estimation. The RPM method is used to find the local motion vectors. The pre-processing image information will first pass a D/A converter and store in the memories. Then CNN will search the global minimum position and decode the x and y coordinates by using global output connected chains. Local analog memory (LAM) is designed to store the image difference information which is passed from an 8-bit D/A converter. With the aid of CNN technique [20], [27] , the global minimum position according to the RPM method could be easily generated and stored in comparing with the traditional DSP computation. The method of global output connected chains is used to connect CNN output and decode the LMV address. A reliable local motion vector extraction method based on CNN architecture is designed for the determination of global motion vector and image compensation processing in practical applications.
1.3 Thesis Organization
This thesis is organized as follows. Chapter 2 describes the models of the proposed image stabilization algorithm and gives a computational analysis to the DIS system. Chapter 3 introduces CNN algorithm, the hardware design of the CNN core and templates design for the CNN-based local motion estimator. Chapter 4 describes the circuit design of application specific CNN (ASCNN) chip. The simulation results with HSPICE, CNN universal machine (CNNUM), and ModelSim respectively are shown in Chapter 5. Conclusions and future works are made in Chapter 6.
2.
CHAPTER 2
DIGITAL IMAGE STABILIZATION
The architecture of the proposed image stabilizer technique shown in Fig. 1 is divided into two processing blocks as motion estimation and motion compensation. The motion estimation block consists of three estimators: the local motion vectors (LMVs), the ill-conditioned motion vector (IMV), and the global motion vector (GMV) estimators. The motion compensation unit consists of the compensating motion vector (CMV) estimation and image compensation. The two incoming consecutive images (at time (t-1) and time (t)) will be firstly divided into four regions. A LMV will be derived in each region by the representative point matching (RPM) algorithm [12], [19]. The motion estimation block also contains a reliability detection function that will generate an ill-conditioned motion vector for the irregular image conditions such as the lack of features or containing large low-contrast area, etc. The GMV estimation determines a global motion vector among LMVs, the IMV, and other pre-selected motion vectors through background-based evaluation function. Finally, the compensating CMV is generated according to the resultant GMV and the image sequences will be compensated based on the CMV in the motion compensation unit.
The proposed digital image stabilizer system contains motion estimation step and motion compensation step. The design blocks are described as follows.
Fig. 1 The architecture of the proposed image stabilization technique.
2.1 Motion Estimation
The motion estimation unit shown in Fig. 1 contains the LMVs, IMV, and GMV estimators. As shown in Fig. 2, the LMVs and IMV estimation is to generate the LMVs and IMV for global
motion vector estimation. The LMVs can be obtained from the correlation between two consecutive images by the representative point matching (RPM) algorithm [12], [19]. The IMV can be obtained from LMVs by evaluating the corresponding confidence indices through the irregular condition detection and the proposed IMV generation algorithm.
Fig. 2 The block diagram of LMVs and IMV estimation.
2.1.1 Local Motion Vector (LMV) and Irregular Condition Detection
First, we obtain location motion vectors by using the representative point matching method. A 19×25 pixels’ macro-block is the basic processing unit of the algorithm. For a given sequence of video image, the specific boundary region of the incoming frame will be discarded first which saves as the compensating area. Then the pre-processing frame will be separated into four regions which will generate a local motion vector after the later steps. Then each region will further cut into many 19×25 pixels sub-regions. We take the center point color of the sub-region image as our representative point value. Each sub-region image has its own representative point value. The previous representing point value is subtracted from the present sub-region image and the absolute value is taken. The minimum value position in the 19×25 absolute differences matrix is considered as the previous representative moving point due to vibration noise. Summing all of the absolute differences (SAD) matrices and give a statistical analysis on them. The vector which calculates form the center point to the minimum SAD value position is considered as the local motion vector of each region.
Our testing image sequence is a tennis player video clip which is 312×200 pixels for each incoming frame. The steps and the results of the algorithm are listed below.
Representative Point Matching (RPM)
Representative Points Buffer
Original Images LMVs Estimation ( Four regions) Minimum Projections of x and y Directions ( Four regions) Inverted Triangle Method Irregular Condition
Detection Generation of IMV IMV LMVs
z Segment the prescribed sequence region (t-1) and (t) into sub-images which each of them is 19×25 pixels as shown in Fig. 3(a).
z Map all the pixels with the central point in each sub-image (t-1). The mapping array is called representative point macro-block (RPM) as shown in Fig. 3 (b).
z Subtraction: The operation is defined by |Msub(t)| := Sub-image(t)-Representative_point
image (t-1) to provide absolute difference for the Msub(t) matrix (19×25 pixels).
z Addition: Add all the |Msub(t)| in the prescribed region to form an 19×25 difference
value matrix as shown in Fig. 4. Fig. 4 is the SAD matrix that map into a 3D view. The z-coordinate is the absolute difference value. The lower of the SAD value, the closer the LMV is.
z Minimum: Find the minimum absolute difference position from the prescribed region
and calculate the vector from the center point. The vector is called the local motion vector of the prescribed region. After analyzing the Fig. 4, we can find that the minimum position lies in the left and top place in the array and we will discuss it more in Chapter 5.
(a) (b) Fig. 3 (a) The original image is divided into 4 regions, and then each region is cut into an array
of 19×25 pixels again. (b) All the pixels are mapped with the central point in each sub-image (t-1).
Region 1
Region 2 Region 4
Fig. 4 The accumulated results of all | Msub(t) | in the prescribed region form an 19×25 difference value array.
After analyzing the curves of correlation values corresponding to image sequences with various conditions, it is found that the curve of correlation values is related to the reliability of motion detection. Fig.5 shows various correlation curves corresponding to image sequences with different conditions. Fig. 5(a) shows a normal case that each region of the incoming frame has its own obvious minimum position, and we can distinguish them from x and y coordinates. Fig. 5(b) shows a valley shape distribution seeing from y coordinate and each region has its own obvious minimum position. But every x coordinate is not reliable due to lack of clear critical point.
(b)
Fig. 5 Various correlation curves corresponding to image sequences with different conditions. (a) Normal case that each region of the incoming frame has its own obvious minimum position. (b) Case of only y coordinate has valley shape distribution.
The curve of correlation values is related to the reliability of motion detection, so we propose a strategy that combines the minimum projections of correlation curve in x and y directions (minimum projections) and the inverse triangle method to detect the irregular conditions from each region to reduce computation complexity. The mathematical expression of minimum projections can be written as
) , ( min ) min( _ p R p q x i q i = yi _min(q)=minp Ri(p,q), (2.1)
wherexi_ min( )p andyi_ min( )p are the minimum projections of correlation curve in x and y directions in region i , respectively. The concept derives from the intuitional sense that the high reliable curve for determining the LMV has a sharp and obvious peak, and no other equivalent peaks appeared in the same curve.
Figure 6 is the projection of x and y correlation curve of each region from Fig. 5. We can see that in Fig. 6(a) each region has obvious minimum x and y position. There are many local minimum points in the x projection in Fig. 6 (b) and only y coordinate can distinguish the minimum position.
-20 0 20 0 200 400 600 800 1000 1200 1400 x -20 0 20 0 200 400 600 800 1000 1200 y -20 0 20 0 200 400 600 800 1000 x -20 0 20 0 200 400 600 800 y -20 0 20 0 200 400 600 800 1000 1200 x -20 0 20 0 200 400 600 800 y -20 0 20 0 200 400 600 800 1000 1200 x -20 0 20 0 200 400 600 800 1000 y Region 1 Region 3 Region 2 Region 4 (a) -50 0 50 60 70 80 90 100 110 120 x -50 0 50 0 200 400 600 800 1000 y -50 0 50 0 100 200 300 400 500 x -50 0 50 0 200 400 600 800 1000 1200 1400 y -50 0 50 70 80 90 100 110 120 130 140 x -50 0 50 0 200 400 600 800 1000 y -50 0 50 50 60 70 80 90 x -50 0 50 0 200 400 600 800 1000 1200 1400 y Region 1 Region 3 Region 2 Region 4 (b)
Fig. 6 Examples of minimum projections of correlation curve from x and y directions in
four regions: (a) regular image sequence and (b) ill-conditioned image sequence.
In order to judge the reliability of the motion vector from Fig. 6 , we combine the inverse triangle method with the minimum projections of correlation curve to find the reliability indices. While the local minimum difference distance is larger than a specific value, the minimum position is not reliable.
Fig. 7 Illustration of the proposed inverse triangle method.
Base on this criterion, the algorithm is designed as follows. In the first step, we find Ti_ min
that represents the global minimum of the minimum projection curve in region i and can be calculated by Eq. (2.2). In the second step, we calculate Sxi and Syi by Eq. (2.3), where offset
is the altitude of the inverse triangle, nxi and nyi are defined as the numbers of Sxi and Syi,
respectively Eq. (2.4), dxiand dyiare defined as the distances of two vertexes of the base of
inverse triangle obtained by Eq. (2.5). The confidence level of x and ydirections are calculated by Eq. (2.6). Since the condition of multiple peaks seriously degrades and affects the determination of reliability, the penalty of multiple peaks is taken into account by Eq. (2.6) to improve the discrimination of reliability. The example shown in Fig. 7 is a curve with twin peaks which will get the penalty of dxi−nxi. In the third step, we determine the confidence indices of xi and yi
in region i through a threshold denoted as TH . The smaller value of confidence level
represents the higher reliability. In the final step, summing up the counts of reliable motion components of x and y in four regions as Eq. (2.7), we get Num x( )i and Num y( ), 1 ~ 4i i= .
The follows describe the procedure:
Step 1.
_ min min( _ min( ))
i i
p
T = x p or i_ min min( i_ min( ))
q
T = y q . (2.2)
Step 2.
Calculate the confidence level,xi_conf andyi_conf .
{ | _ min( ) _ min } { | _ min( ) _ min } xi i i yi i i S p x p T offset S q y q T offset = < + ⎧⎪ ⎨ = < + ⎪⎩ , (2.3) number of number of xi xi yi yi n S n S = ⎧⎪ ⎨ = ⎪⎩ , (2.4) max min max min xi xi xi P P yi yi yi q q d S S d S S = − ⎧⎪ ⎨ = − ⎪⎩ , (2.5) _ 2 _ 2 i xi xi i yi yi x conf d n y conf d n = − ⎧⎪ ⎨ = − ⎪⎩ . (2.6) Step 3.
Set the threshold,TH, for determining the reliability indices.
If xi_conf < TH Then xi is reliable,
Elsexiis unreliable,
End if.
If yi_conf < TH Then yi is reliable,
Elseyiis unreliable,
End if.
Step 4.
Calculate the numbers ofx andi y in four regions. i
( ) sum of ( is reliable) ( ) sum of (y is reliable) i i i i Num x x Num y = ⎧ ⎨ = ⎩ , i=1 ~ 4 (2.7) .
2.1.2 Irregular Motion Vector (IMV)
Irregular motion vectors can be detected and excluded by using minimum projection and inverse triangle method; however, image sequence with ill-condition such as lack of feature, large low-contrast area, moving object or repeated pattern, may contain fewer available MVs (most of the MVs are irregular) in four regions. Therefore, recombination of these available regular MVs is necessary to form an ill-conditioned motion vector (IMV). To solve this problem, a median function is used to extract a motion vector with respect to each direction for ill condition.
The calculation to determine the IMV is described as follows in details.
Case 1. If Num x t( ( ))i =4 then
_ ( ) ( _ ( ), _ ( ), _ ( ), _ ( ), ( 1))
ill x a x b x c x d x x
V t =Med V t V t V t V t GMV t− ,
Case 2. IfNum x t( ( ))i =3 then
_ ( ) ( _ ( ), _ ( ), _ ( ))
ill x a x b x c x
V t =Med V t V t V t ,
Case 3. IfNum x t( ( ))i =2 then
_ ( ) ( _ ( ), _ ( ), ( 1))
ill x a x b x x
V t =Med V t V t GMV t− , (2.8) Case 4. If Num x t( ( ))i =1 then
_ ( ) _ ( )
ill x a x
V t =V t ,
Case 5. If Num x t( ( ))i =0 then
_ ( ) ( 1)
ill x avgx
V t = ×γ GMV t− ,
whereNum x t( ( ))i is the number of x component of reliable LMVs, Vill_x( )t is the x component of
IMV, Va_x( )t , Vb x_ ( )t , Vc_x( )t , and Vd_x( )t represent x components of reliable LMVs in different
region, respectively, Med( ) in Eq. (2.8) is the function of median operation, GMV tx( −1) is the x
component of last previous GMV, t is frame number, γ is attenuation coefficient, 0< <γ 1. The GMVavgx( )t can be calculated by
1 0 ) ( ) 1 ( ) 1 ( − + − < < × =ς GMV t ς GMV t ς GMVavgx avgx x (2.9)
Then we apply the similar process to obtain Vill_y( )t . The resultant IMV is represented by
_ _ ( ) ( ) ( ) ill x ill ill y V t V t V t ⎡ ⎤ = ⎢ ⎥ ⎣ ⎦ (2.10)
2.1.3 Global Motion Vector
The LMV in each region may represent global motion vector, moving object motion vector, or even error vector. The error vector may cause by the ill condition or the mixture of global motion and moving object motion. Although the reliable global motion vector is essentially selected from LMVs and IMV, however, in the worst case, i.e. estimations of LMVs and IMV are all fault due to high noise image sequence, it will induce artificial shaking result due to adopt an error GMV. Therefore, if the evaluation includes the zero motion vector (ZMV), it can prevent the occurrence of this case. Similarly, for a high noise image sequence with panning, the last previous GMV will be the best choice if the estimations of LMVs and IMV are all fault. In the proposed IS technique, the seven motion vectors including four LMVs, the IMV, the ZMV, and the last previous GMV, referred as pre-selected motion vectors (pre_MV), are employed to estimate the GMV of the current frame. In general, one of LMVs is the highly probable GMV for the regular image; the IMV is the highly probable GMV for ill-conditioned image; the ZMV can prevent worse compensation result caused by the fault MVs; and the last previous GMV is useful for panning condition. In this paper, a background-based evaluation function is proposed to overcome this problem. Fig. 8 shows the areas for background-based evaluation. Five regions are selected to evaluate the result, which are located on the surroundings of the image. The reason is that, in most cases, the foreground object is located on the center of the image, so the surroundings of the image are the best candidates for background detection.
Fig. 8 Areas for background detection and evaluation
, , ( 1, , ) ( , , ) , 1 5, 1 7, i i B c c c X Y B SAD I t X Y I t X X Y Y i c ∈ = − − + + ≤ ≤ ≤ ≤
∑
(2.11) where I t( −1,X Y, ) is the intensity of the point( , )X Y at frame t-1,Biis the i-th background regionin the image, X Yc, care the components of the seven pre-select motion vectors (pre MV_ c) in x and
y directions. Eachpre MV_ c can obtain it’s SADB ci, in each region. The smaller SADB ci, represents
the higher probability of the desired motion vector among theses pre-selected motion vectors. Five-region peer-to-peer evaluation can prevent the situation that some partial high-contrast image regions dominate the evaluation result. In this algorithm, each region has an equal priority to determine the result. The pre MV_ c with the smallest Sc is the desired GMV and it can be
expressed as
GMV=pre MV_ i, for arg(min c)
c i= S , (2.12) where 5 , 1 c i c i S S =
=
∑
. The score for eachpre MV_ c in region i is denoted asSi c, . Hence, Sc is theimportant index to determine the GMV.
2.2 Motion Compensation
It is necessary to generate the compensating motion vectors (CMVs) for removing the undesired shaking motion while keeping the steady motion of the image sequence. The conventional compensating motion vector estimation was given by [18]
CMV t( )=k CMV t( ( − +1)) (
α
GMV t( ) (1+ −α
)GMV t( −1)), (2.13) where t represents the frame number, 0< <k 1 and 0≤ ≤α 1. In the case, there is the tremendous lag condition due to the steady panning effect. It will reduce the available effective image area. The CMVs are generated by Eq. (2.13) with the clipper function [34] as
(
)
, ) ( ) ( 2 1 )) ( ( ) (t clipper CMV t CMV t l CMV t l CMV = = + − − (2.14)where l is boundary limitation, i.e. maximum window shift allowance. The lag can be reduced to a certain range; however it will also decrease the performance of shaking compensation due to the picking window operating near the boundary area. To attack this drawback, we combine the inner feedback-loop integrator with clipper function to reduce the steady-state lag for steady motion as well as to keep the CMV to operate in the appropriated range. Fig. 9 shows the block diagram of the proposed CMV generation method. There is an integrator in the inner feedback loop, which can eliminate the steady-state lag of the CMV in panning condition. That means, by employing the integrator, shaking components of images with constant panning as well as those in regular images can be stabilized. The proposed CMV computation procedure is presented by
) 1 ( _ )] 1 ( ) 1 ( ) ( [ ) 1 ( ) (t =Κ⋅CMV t− + ⋅GMV t + − ⋅GMV t− − ⋅CMV I t− CMV α α β , (2.15)
CMV_I(t) =CMV_I(t-1)+CMV(t) and CMV(t)=clipper(CMV(t)),
where [0 0]T ≤ Κ,α,β ≤ [1 1]T and clipper() is defined as Eq. (2.14).
1 1 z
β
− − l l −Fig. 9 The block diagram of the proposed CMV generation method.
2.3 Implementation of Local Motion Estimation
After programming the DIS system, we find that it can do an excellent off-line job comparing with the RPM fuzzy set theory [32]. But it is hard to implement the algorithm into practical consumer camcorders. Although there are still several DIS systems which could do the on-line job by using PC [45], it is still a long way to go into the DIS hardware design.
Therefore it is necessary to analyze the DIS computation load before the system implement on practical camcorders. We record the computation time from each DIS step and compare them
might lack of accuracy due to different test videos, but we still can recognize that most of the computation loading belongs to the motion estimation step. Fig. 10 shows the percentage of the computation loading of each block in DIS system. The motion estimation step contains 80 % processing time and the motion compensation contains the rest 20%. And more than halve computation time belongs to local motion estimation step. This is because the RPM method in LMV step need to load the image SAD values into memories first and find the minimum value pixel by pixel. The tasks of finding global minimum position by using a DSP processor is comparing and storing the minimum value of neighbor pixel and jump to next pixel. The minimum position information should also store in the memory. The larger the processing unit is, the longer processing time it takes. Our strategy is to design an application specific chip which could highly reduce the computation time in LMV estimation. The less the processing time in LMV estimation step the higher capability of real-time operations for image stabilization processing is.
The LMV estimation chip is designed with CNN technology to solve the heavy computation time problem. Compared with conventional digital technology, CNN-based computing is capable of realizing these TeraOPS-range image processing tasks in a cost-effective implementation. The design concept of ASCNN chip is shown in Fig. 11. By using an 8-bit D/A converter, the absolute image difference which ranges from 0~256 could store into the CNN local analog memories (LAM). And with the aid of CNN technology, the LMV position could be easily found compared with the DSP processor. The CNN theory and chip design are introduced in Chapter 3 and Chapter 4.
Fig. 10 Computational complexity of the DIS flow.
Fig. 11 The design concept of ASCNN chip architecture.
Compensated
Images
LMVs
Estimation
Motion Estimation
~80%
Global Motion
Vector Estimation
Motion Compensation
~20%
Compensation
Motion
Vector Estimation
LMVsImage
Compensation
CMV ~63% ~17%Original
Images
Store in Buffer GMV3.
CHAPTER 3
Cellular Neural Network
The original Cellular Neural/Nonlinear Networks (CNN) paradigm was first introduced by Chua and Yang [20]. CNN technology is both a revolutionary and experimentally proven new
computing paradigm. The two of most fundamental ingredients of the CNN are: the use of analog processing cells with continuous signal values, and local interaction within a finite radius. CNN possesses some of the key features of neural network, which has important potential applications in such areas as image processing and pattern recognition. The CNN theory and architecture will be introduced first and the next is CNN circuit. The last will include the inversion and adaptive threshold properties of CNN which are used to calculate the LMV.
3.1 CNN Theory
CNN can be considered an implementable alternative to fully connected neural networks and a remarkable improvement in hardware implementation of artificial Neural Networks. Local interconnection and simple synaptic operators are the most attractive features of the CNN for VLSI implementation in high-speed, real-time applications [35] and the CNN are widely used in several application fields, such as image processing and pattern recognition. Several hardware implementations of the CNN have been reported in the literatures [37], [38], [39].
The state equation of CNN can be represented by
( )
( )
( )
( )
, , , ; , , , ; , , , , ( , ) , ( , ) i j i j i j k l k l i j k l k l i j k l Nr i j k l Nr i j x t x t A y t B u t I ∈ ∈ = − +∑
+∑
+ , (3.1)( )
(
( )
)
1(
( )
( )
)
1 1 2 y t = f x t = x t + − x t − , (3.2)where i, j refers to a grid point associated with a cell on the 2-D grid, and k, l
∈
Nr(i,j) is a gridpoint in the neighborhood within a radius r of the cell i, j. A and B are the nonlinear cloning templates [40]. Fig. 12 shows the dynamic route of state in CNN. The feature of the Eq. (3.2) has been plotted at Fig. 13.
In many applications, the templates (A,B) and the threshold I are translation invariant. In the case of single variable A and B functions, the linear (space-invariant) template is represented by the additive terms as Eq. (3.1).When the template is space invariant, each cell is described by a simple identical cloning template defined by two (2r + 1) × (2r + 1) real matrices A and B, as well as the constant term I. In addition, as a very special case, if the input and the initial state values are sufficiently small and f is piecewise linear, then the dynamics of the CNN array is linear. Unlike other standard analog processing arrays, or neural networks, the one-to-one geometric (topographic) correspondence between the processing elements and the processed signal-array elements (e.g., pixels) is of crucial importance. Moreover, the template has geometrical meanings which can be exploited to provide with geometric insights and simpler design methods.
3.2 CNN Architecture
The basic circuit unit of CNN is called a cell. It contains linear and nonlinear circuit elements, which typically are linear capacitors, linear resistors, linear and nonlinear controlled sources, and independent sources. The structure of CNN is similar to that found in cellular automata, and each cell in a CNN is connected only to its neighbor cells. Adjacent cells can interact directly with each other. Cells not directly connected together may affect each other indirectly because of the propagation effects of the continuous-time dynamics of the network. A typical example of a cell C(i, j) is shown in Fig. 14, where the suffixes u, x, and y denote the input, state, and output, respectively.
Fig. 14 The circuit of a CNN cell.
The differential equation governing a CNN in Eq. (3.1) is rewritten as follow :
. 1 ; 1 , ) ( ) , ; , ( ) ( ) , ; , ( ) ( 1 ) ( ) , ( ) , ( ) , ( ) , ( N j M i I t v l k j i B t v l k j i A t v R dt t dv C j i N l k C ukl j i N l k C ykl xij x xij r r ≤ ≤ ≤ ≤ + + + − =
∑
∑
∈ ∈ (3.3) 1 1 :− ≤ ≤ ∈R y y where NAnd A is an M-by-N ( M=N ) real symmetric matrix defined as
j i
E
,I
C
R
xI
(
i
,
j
;
k
,
l
)
xuI
xy(
i
,
j
;
k
,
l
)
yxI
yR
xijv
v
yij uijv
⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ = 0 1 1 ... 0 1 0 1 1 0 ... ... ... ... ... ... 0 A A A A A A A A A A (3.4) Here I = [I1 I2 ….. IN] T
is an N-by-1 constant vector and the input vector vu can be defined in
a similar way. A0 and A1 are two m× Toeplitz matrix with elements determined by a given m
cloning template. The synapse weights of the shift-invariant CNN can be described by the feedback and feed forward cloning templates:
⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ = 2 1 2 1 0 1 2 1 2 2 1 2 1 0 1 2 1 2 b b b b b b b b b T a a a a a a a a a TA B (3.5)
Where all elements respect the normalized numbers to Tx=1/Rx, and ai =A(i, j;i, j)/Tx >1. The matrix B can be defined in the similar way. Then, the maximum value of x in the steady state is the sum of absolute values of all inputs from the neighborhood cells,
∑
∑
= = + + = 3 1 , 0 3 1 , max | | (, )| | (, )| | | | j i B j i A i j T i j x T x (3.6) where ,|u|≤1,| y|≤1 and x0 =I/Tx. The neuron cell should be able to handle the state voltageof the range |x|≤|xmax |.
3.3 CNN Circuit Design
The current-mode approach [40], [42] is used in CNN circuit design because it has superior mathematical addition properties. The summation of weighted currents is simply done by appropriate transistor sizing. The piecewise-linear function is achieved by cascading two current limiters as shown in Fig. 15.
(a) (b)
Fig. 15 Piecewise linear function. (a) Schematic view and (b) Transfer characteristics of two current limiters in the cascade [36].
The limiting operation of the input current denoted by Ix first takes place at a negative value
Ix = -IQ and at a positive value Ix = IQ. For Ix≤−IQ, there is no currents that flow through the
transistors M3 and M4. Therefore, IDS5 = IDS6 = 2IQ and Iy = -IQ, where IDS5 represents the
drain-to-source current of M5, and so on. For Ix>−IQ, IDS3 = IDS4= IQ + Ix and IDS5 = IDS6 =
(IQ – Ix) produce the output current Iy = IQ - IDS6 = Ix. However, if Ix>IQ, then IDS5 = IDS6 =
0 and Iy = IQ.
Figure 16 shows a detailed schematic diagram of a neuron cell [36]. The synaptic weight is realized by M11 - M14 for a0 and M15 – M16 for a1. Four copies of a current mirror are used to provide the weight for fore neighboring cells. The external input current, bias current, feedback current, and those from the neighboring cells are summed at the drain terminal of M1. The offset circuit provides a bias current which is set by bias voltage VBI. The output voltage generator is made of a simple current comparator using a cascade of two inverters. The input currents from neuron circuit, weighting circuit and reference voltage set by VBB are compared to produce an output Vy which represents the sign of the neuron output.
Fig. 16 The CNN cell with fixed weights (templates).
3.4 CNN Template Consideration
3.4.1 Image Difference
The first step of RPM method will subtract the present sub-region pixels with past representative point pixel color. We can implement subtraction step with image inversion and current addition. The inversion template [43] lists below. The input of CNN is grayscale representative sub-region. ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ − = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ − = 0 0 0 0 2 0 0 0 0 0 0 0 0 1 0 0 0 0 B A T T (3.7) Figure 17 shows the simulation result of CNN inversion template [44]. Fig. 17 (a) shows the input of CNN. Fig. 17(b) shows the initial state of CNN, and the state will subtract from input.
(a) (b) (c)
Fig. 17 Simulation of CNN inverse template. (a) Input of gray-scale image. (b)The initial state of CNN. (c) Output after difference processing.
Because we take current mode CNN as processing core, the addition step can set initial state of CNN as zero and directly combine the input current between the inversion representative sub-region and present sub-region.
3.4.2 Global Minimum
To search the minimum position in a specify area not only takes time but also consumes lots power. Comparing previous value and storing the minimum value is the basic processing step. The larger area need to be determined, the more clock cycle, ie., power, it takes. Egusa [17] proposed to use analog circuit to find the global minimum value. But the circuit only suit for few input application. Therefore, we propose to use CNN adaptive threshold template with capability of finding the global minimum position in larger array and can process with less clock period. The adaptive threshold template lists below.
⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ = ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ = 0 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 0 0 B A T T (3.8) The adaptive threshold template not only simplifies the CNN state equation (3.1), but also makes the template easier to implement in VLSI. We can write an equation to represent Eq.(3.8) as: I u Y X I Bu AY X X =− + + + =− +2 + + . (3.9)
We then set the set the initial state of CNN as zero which is reasonable for circuit design and also don’t need to implement any initial circuit for CNN core. And we set the sigmoid function will saturate at ±20uA. This is because we give every current source of CNN core as 20uA. The CNN adaptive threshold template analysis is shown in
Table1.
Table 1: CNN Adaptive Threshold Template Analysis
Case 1
Case 2
Ibias
Iu = -10uA and x (initial) = 0
Ibias x Ibias x x x=− +2 +(−10)+ = −10+ .
Iu = 10uA and x (initial) = 0 ;
Ibias x Ibias x x x=− +2 +(−10)+ = +10+ .
-20
10 20 30 . − = − + − = x ,saturate y still strict in -20 10 20 10 . − = − + = x now x = 0 – 10 = -10 ,unstable 20 20 10 10 . − = − + + − = x now x = -10 -20 = -30 ,and y strict in -20 ,stable
-10
10 10 20 . − = − + − = x now x = 0 -20 = -20 = y , stable 0 10 10 . = − + = xnow x = 0 +0 = 0 , critical point !
0
10 0 10 . − = + − = x now x = 0 -10 = -10 , unstable 20 10 10 . − = − − = x 10 0 10 . = + = x now x = 0 +10 = 10 , unstable 20 0 10 10 . = + + = x10
10 10 0 . = + − = xnow x = 0 + 0 = 0 , critical point !
20 10 10 . = + = x now x = 20 = y ,stable
20
10 20 10 . = + − = x now x = 0 + 10 =10 = y , unstable 20 20 10 10 . = + − = x now x = 20 + 10 = 30,and y strict in 20 ,stable
30 20 10 . = + = x now x = 0 + 30 = 30
and y strict in 20 ,stable
We use the property of 【case 2】 to implement adaptive threshold template on searching global minimum position in RPM method. With the aid of CNN array, a brand-new searching method has developed for DIS algorithm. As shown in Fig. 18, the SAD values are plotted in 3D view, and the CNN output will change from logic 1 to logic 0 while any of the difference value is below the threshold level and others remain logic 1. If none of the position in the processing array flip its output logic, CNN bias control circuit will tune the threshold to a higher level until the minimum input in the array is lower than bias current.
4.
CHAPTER 4
CIRCUIT DESIGN OF CNN-BASED LMV
ESTIMATION
The process of finding LMV is very computationally intensive, requiring billions of operations for each image. The most complexity operations occur in 1) computing the motion vector and the difference value and 2) storing the difference value with the position information if it is smaller than any previous value. Since the operation slows down the computation, the CNN architecture is suitable for motion computation and is done by CNN with a fixed template and the tunable bias current circuit for each cell.
The tested image is a tennis player video of 312×200 pixels. Since image sensors can get more pixels than the video image requires, each image captured by sensors will first cut out a specific boundary pixels which saves as compensation area. Removing boundary area is called pre-processing step, and each image size now becomes 300×190 pixels. The motion estimation block diagrams which finds LMVs with CNN processing is shown in Fig. 19. Before entering CNN processing, the captured image has to be cut into four regions, and each of region will be found their own LMV. After this, each region again is divided into 30 sub-regions as mentioned in Chapter 2. Each sub-region is the size of 19×25 pixels processing block. We first store the center point image color value (0~255) of every sub-region in the prescribed region which is captured by (t-1)th image sequence. According to the RPM method described in Chapter 2, the absolute difference information should be computed by t th and (t-1)th sub-images. Through the digital-to-analog converter, the difference information could be stored in Local Analog Memories (LAM). The 30 sub-regions’ absolute difference value array will stack into CNN LAM from each processing region and the memories voltage information will vary from the difference values stored in the LAM. CNN does not begin to compute motion vectors until LAM accumulates all the difference information of 30 sub-regions. The CNN processor will check the global minimum position by using a 32-level threshold bias. The minimal difference value and the position information would be found within 32 clock cycles and then be latched in the location registers.
The processing time is not effected by the size of CNN. Therefore the larger the difference array is, the faster processing time compared with DSP processor will be.
The system shown in Fig. 19 includes windowing (RPM and SAD), the 19×25 CNN array and LAM, the bias control circuit, and the addressing decoder.
Fig. 19 The flow of CNN-based local motion estimation.
Figure 20 shows the architecture of the application-specific CNN (ASCNN) design. By the windowing component of Fig. 19, the sequential images are segmented into many 19×25 pixels sub-regions for each region, and the absolute difference [17] of two images is calculated. Through an 8-bit D/A converter, the difference value is loaded and accumulated into LAM which consists of switch MOS capacitors. The bias current circuit will adjust the CNN array’s threshold according to the values of global output connected chains. If all difference values are higher than the given bias, the higher current fed into the CNN input from bias circuit will be. The process
Capture single image Cut four regions Cut 30 sub-images for a process element 300 190 25 19 (t-1)thsub-image Represent point at (10,13) for each sub-image
tthsub image
Pixel-by-pixel subtraction
ABS 8-bit D/A
Local memory array Accumulate 30 sub-image values Input data to a 19 by 25 CNN array
Column decoder
Row decoder Adaptive
bias circuit X Reg Y Reg Capture single image Cut four regions Cut 30 sub-images for a
process element 300 190 25 19 (t-1)thsub-image Represent point at (10,13) for each sub-image
tthsub image
Pixel-by-pixel subtraction
ABS 8-bit D/A
Local memory array Accumulate 30 sub-image values Input data to a 19 by 25 CNN array
Column decoder
Row decoder Adaptive
will detect the X and Y position information in no time and store in the registers.
Fig. 20 AS-CNN chip architecture.
4.1 D/A Converter
Analyzing the minimum image difference value of the input video sequence is necessary for the D/A converter. Fig. 21 shows the image sequences verses the minimum pixel difference values of each frame. We can calculate that the mean of minimum difference value is located in 529 difference values, but the data would vary from each video. Therefore the upper bound of the minimum difference has 1024 pixels. The upper bound takes maximum charging input for four times, ie., 256× , before LAM reach the 3.3 volts. Note that there is an exception in the 4 sequence No.48. The minimum value is over 1024. This is because there’s a great movement for the whole region which is caused by intentionally moving the camcorder or the object is too large in this region so that it is considered as the background. LMV located in this region is not dependable and should not be stored. This kind of situation can be detected and discarded by the CNN controller.
0 200 400 600 800 1000 1200 1400 0 10 20 30 40 50 60 70 80 90 Frame No. di ffe re nc e (pi xe l c ou nt )
Fig. 21 Analysis of minimum difference value for tennis player video frames.
An 8-bit D/A converter is used to translate the image absolute difference code into analog current and load into local analog memory. In the first, the DAC input will pass registers in order to synchronize with digital control circuit, which is trigger by 20 MHz clock.
The DAC is made of eight sets of current mirrors shown in Fig. 22 and the output stage is the 475 sets of LAM. Table 2 lists the DAC specification to keep the function accuracy of the input stage design.
Table 2: 8-bit D/A Converter Specification Table.
Model Application Specific CNN 8-bit DAC
Output Loading 50Ω
Resolution 8-bit
Relative Accuracy
0~70oC | 0.5 LSB |
Output range 0~255
Full output for LAM
0~5.5mV 0~3.3V DNL Max DNL < 0.25 LSB INL 0.1615 LSB SNR 55.8148 dB Digital Input
Data Input , Voltage : Logic 1 Logic 0 Control Input, Voltage : Logic 1
Logic 0 Input capacitance 3.3V 0V 3.3V 0V 4pF Power Supply
Operating Voltage Range (VDD=3.3v) MAX @ 11111111 current (Icc) 6.9513 mW 2.1064 mA Operating Temperature 0 ~ 70 oC 1.
Output Swing
Figure 23 shows the output voltage variation of 256 levels with 50Ω loading. The charging voltage of the analog memory in the output stage is proportional to the input current and the loading time. The larger the image difference value is, the higher the memory voltage will be.
2. DNL
Figure 24 is the DNL analysis between the output voltage in Fig. 23 and ideal voltage curve ie. V_ideal=50Ω× 0.43091/1000× t. The value 0.43091e-3 is the slope of output voltage due to the input change from 0 to 1. The coordinate Y represents the difference percentage of two curves versus LSB. The largest DNL in Fig. 24 is about 0.25 LSB. The coordinate X represents the time and the time step is 50 ns while input change data from 0 to 255.
Fig. 24 DNL analysis.
3. INL
The INL is accumulated form all DNL data in Fig. 24. The result shows that, INL = 0.1615 LSB < 0.5 LSB
4. SNR
Because the input working frequency is set in 20MHz ie, fclk=20MHz, we give a sine wave input with 20MHz/28 frequency for DAC and record the output wave form as shown in (a). Then we sample the output stable voltage at 25 ns and get 140 sampling points. DFT position analysis is shown in (b). We take logarithm for the x coordinate and find the
Power DC= 0.1479 Power Sine = 0.0369
Power Homonic = 1.6947e-007 Power Noise = 9.6831e-008
SNR_dB = 55.8148dB > 49 dB
Fig. 25 SNR analysis of 8-bit DAC. (a) sine wave output wave form of DAC; (b) DFT analysis of DAC output form with 140 sampling points; (c) The logarithm value for the x- coordinate in (b).
Layout of the DAC is shown in Fig. 26. The 8-bit input is from the output of synchronize registers and the maximum current output is 2.1064mA which is designed to make each LAM cell has five times charging capability before reaching the 3.3 volt upper bound.
Fig. 26 Layout of 8-bit current mode D/A converter
4.2 Local Analog Memory
Local analog memory (LAM) is designed to store the image difference value. The basic structure of the LAM is shown in Fig. 27 is a LAM cell. It consists of a transmission gate controlled by the CNN controller and a 2p MOS capacitor. defined in input stage. Vctrl_P and Vctrl_N are switch signals which are from the position decoder. Although there exists non-linear problems for the MOS capacitor design, for area consideration we still choose it rather than the poly capacitor. For a 2pf poly capacitor design, the nonlinear problems is much easier to solve, but the area take 2314um2 while the MOS capacitance only cost 1800um2 [51]. Therefore using of MOS capacitor could save much on-chip area. The characteristic curve of MOS capacitor is shown in Fig. 28. Since MOS capacitor has nonlinear transformative property, the capacitor value will have 600f variation while the gate voltage Vgs change from 0 to 0.65 volts. Pre-charging a specific voltage, i.e., 0.65 should be done before loading information into each capacitor. Layout view is show in Fig. 29. Symmetric layout style is used for every two LAM cells to reduce the mismatch during fabrication.
However, accuracy of MOS capacitor value is not important. As long as the global LAM array has uniformly capacitor properties, CNN processor will be able to find the correct global minimum position.
Fig. 28 The characteristic curve of MOS capacitor.
4.3 Voltage to Current Converter
A voltage to current converter (VCC) shown in Fig. 30 is required to transform the LAM voltage into CNN current input. With properly design the W/L ratio, the output of VCC to CNN is limited to 20uA. VCC maximum power consumption is 104.6uW when the input signal is 3.3 volts. Fig. 31 shows the layout view of VCC. A common-centroid arrangement is used for the current mirror device M1 and M2.
Transistor M2 with Vsh (set to 2.85 volts) is used to shield switch noise from output stage and prevent LAM voltage from leakage. Analysis the current mirror of Fig. 30, we can write
) 1 ( ) ( ) ( 2 1 1 2 1 1 p ox GS TH DS D V V V L W C I = µ − +λ (4.1) ) 1 ( ) ( ) ( 2 1 2 2 2 2 p ox GS TH DS D V V V L W C I = µ − +λ (4.2) 1 2 1 2 1 2 1 1 ) / ( ) / ( DS DS D D V V L W L W I I λ λ + + ⋅ = (4.3) where λ is the channel length modulation coefficient.
Figure 32 is the VCC voltage to current transfer curve. With 50Ω output loading, the current will change from 0 to 140uA while Vin changes from 0 to 3.3 volts. Quite low sub-threshold current arises as Vin changes from 0 to 0.5 volts. This is because Vin is lower than the threshold voltage (VTH) of Min transistor.