• 沒有找到結果。

自發性比例式記憶體細胞非線性網路之設計

N/A
N/A
Protected

Academic year: 2021

Share "自發性比例式記憶體細胞非線性網路之設計"

Copied!
70
0
0

加載中.... (立即查看全文)

全文

(1)

電子工程學系 電子研究所碩士班

自發性比例式記憶體細胞非線性網路之設計

The Design of the Autonomous Ratio Memory Cellular Nonlinear Network

without Elapsed Operation for Pattern Learning and Recognition

研 究 生:周維德

指導教授:吳重雨 教授

(2)

自發性比例式記憶體細胞非線性網路之設計

The Design of the Autonomous Ratio Memory Cellular Nonlinear Network

without Elapsed Operation for Pattern Learning and Recognition

研 究 生:周維德 Student:Wei-Te Chou

指導教授:吳重雨 Advisor:Chong-Yu Wu

國 立 交 通 大 學

電子工程學系電子研究所

碩 士 論 文

A Thesis

Submitted to Department of Computer and Information Science College of Electrical Engineering and Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Electronics Engineering December 2007

Hsinchu, Taiwan, Republic of China

(3)

自發性比例式記憶體細胞

非線性網路之設計

學生:周維德

指導教授:吳重雨 博士

國立交通大學

電子工程學系 電子研究所碩士班

摘要

在圖形辨識的領域中,聯想式記憶體是一種相當熱門的辨識方法,它能將含有雜訊 的圖形恢復成完美無雜訊的圖形,而比例式記憶細胞非線性網路已被證實可以作為一種 聯想式記憶的實現方法。而目前比例式記憶體非線性網路所面臨的挑戰即是如何提升其 在高雜訊的環境下的辨識率。 本 論 文 的 主 旨 在 於 闡 述 自 發 性 比 例 式 記 憶 體 細 胞 非 線 性 網 路 ( Autonomous Ratio-Memory Cellular Nonlinear Network,簡稱 ARMCNN)架構之分析與設計及其在 聯想式記憶及圖像辨識上之應用。所謂的自發性是指在辨識階段,帶有雜訊的輸入訊號 將在各個細胞存成初始電壓,而非一固定的輸入電壓。此外,本設計也具有免衰減操作 即可得到所需比例鍵值之優點。在圖形學習階段,過去的比例式記憶細胞非線性網路 (Ratio-Memory Cellular Neural Network,簡稱 RMCNN)比例鍵值產生方式是將絕對 鍵值(absolute weight)與細胞鄰近四邊的絕對鍵值平均值作比較,如果大於平均值, 則此鍵值將被保留,反之則忽略此鍵值。而新提出的 ARMCNN,是將細胞相鄰四邊的絕對 鍵值改為只保留最大的絕對鍵值。模擬結果證明自發性比例式記憶細胞非線性網路相較 於具有較高的辨識率。 論文中除了以 Matlab 和 C 語言模擬自發性比例式記憶細胞非線性網路架構 (ARMCNN)及其在聯想式記憶和圖像辨識上之應用外,並實際以 TSMC 0.35um 2P4M Mixed-Signal 製程設計了一解析度為 9x9 的 ARMCNN 網路,並實現之且加以量測。本設

(4)

計的單位面積在相同製程下,縮小為前一版設計─免衰減操作之 RMCNN 的 0.28 倍大(從 4.56mm x 3.90mm 縮小到 2.24mm x 2.24mm)。

量測中所學習的三個圖形(一、二、四)皆可成功的辨識,而辨識中的一些瑕疵, 也將在論文中進行探討。並從新設計電路,在 Hspice 模擬驗證新電路確實可以改善此 缺陷。

(5)

The Design of the Autonomous Ratio-Memory Cellular

Nonlinear Network for Pattern Learning and Recognition

Student:

Wei-Te

Chou Advisor:

Prof.

Chung-Yu

Wu

Department of Electronics Engineering & Institute of Electronics

National Chiao-Tung University

ABSTRACT

The associative memory is of significant attention in the field of pattern recognition and recovery. It is proven that the cellular nonlinear network with the aid of ratio memory (RMCNN) can be used to implement as a kind of associative memory. However, there are still some imperfections that require further improvement for the existing RMCNN system. For example, the pattern recognition rate of RMCNN drops quickly as the environmental noise level raises. Moreover, the die area of the existing chip is too large (4.56mmx 3.90mm), which might suffer from the impact of process variation more seriously. Therefore, the chip area reduction and optimization are necessary.

A new type of CNN associative memory called the Autonomous Ratio-Memory Cellular Nonlinear Network (ARMCNN) is proposed and analyzed. In the proposed ARMCNN, there is no elapsed operation to perform weight enhancement as well. During recognition period, the noisy input patterns are sent into cells as initial cell state voltages, which in comparison with constantly injecting the noisy input patterns, yields a better recognition rate in simulation.

During pattern learning period, the ratio weight is original generated by comparing the four neighboring absolute weights with their mean value. The absolute weights that are bigger

(6)

than the mean value will remain. However, in ARMCNN, only the strongest absolute weights will stay (might be more than one). Furthermore, the proposed ARMCNN inherits the features of RMCNN such as, feature enhancement effect and no elapsed operation (EO). The ratio weights are generated directly after pattern learned.

In this thesis, the circuit of ARMCNN w/o EO is designed and a 9x9 ARMCNN is implemented using TSMC 0.35um 2P4M mixed-signal process. The die area, as compared with the previous chip – RMCNN w/o elapsed operation, shrinks from 4.56mm x 3.90mm to 2.24mm x 2.24mm. It’s only 0.28 times as large as the previous chip under the same technology process, which greatly reduces the impact of process variation. The experimental results of recognizing all three patterns are successful. However, some imperfections of pattern recovery still exist and will be discussed later in this thesis. The circuit is redesigned to correct these imperfections.

(7)

CONTENT

CONTENT ... v

TABLE CAPTIONS ... v

FIGURE CAPTIONS ...vii

CHAPTER 1... 1

INTRODUCTION ... 1

1.1 Background of Cellular Nonlinear Network ... 1

1.2 Review of Ratio Memory Cellular Nonlinear Network... 2

1.3 Research Motivation and Thesis Organization... 4

CHAPTER 2... 6

ARCHITECTURE AND CIRCUIT IMPLEMENTATION ... 6

2.1 Operational Principle and Architecture ... 8

2.2 Circuit Implementation... 20

2.2.1 V-I Converter ... 20

2.2.2 Comparator ... 24

2.2.3 Digital Components ... 25

2.2.4 Output Stage and Input Pattern Interface ... 32

2.2.5 Cell for Global Maximum Absolute Weight Determination... 34

CHAPTER 3... 36

SIMULATION RESULT... 36

3.1 Behavior Simulation Result ... 36

3.2 Hspice Simulation Result ... 37

CHAPTER 4... 44

EXPERIMENTAL RESULTS ... 44

4.1 Layout Description ... 44

4.2 Experimental Environment Setup ... 46

4.3 Experimental Result ... 48

4.3 Cause of the Imperfection Experimental Result... 51

CHAPTER 5... 55

CONCLUSION AND FUTURE WORK ... 55

5.1 Conclusion ... 55

5.2 Future Works ... 56

(8)

TABLE CAPTIONS

Table 2.1 Template A of ratio weights and the corresponding absolute-weights ... 9 Table 4.1 The function of each control signals... 18 Table 4.2 the summary comparison of this work and the previous work ... 19

(9)

FIGURE CAPTIONS

Fig. 2.1 The block diagram of ARMCNN ... 12

Fig. 2.2 The general architecture of ARMCNN ... 12

Fig. 2.3 The detail architecture of RMCNN ... 13

Fig. 2.4 Architecture of RMCNN in learning period... 14

Fig. 2.5 The connection relationships of COMP, Counter_L and RM ... 17

Fig. 2.6 Architecture of RMCNN in recognition period ... 19

Fig. 2.7 The V-I converter T1 ... 21

Fig. 2.8 The V-I converter in T2D ... 22

Fig. 2.9 Weight: Generation of ratio current ... 23

Fig. 2.10 The schematic of V-to-I converter T3 and current mode comparator COMP... 24

Fig. 2.11 The dimension of the current mode comparator COMP ... 25

Fig. 2.12 The schematic diagram of the counters in this chip ... 25

Fig. 2.13 The schematic diagram of the asynchronous flip-flop ... 26

Fig. 2.14 The schematic diagram of the Local Counter & Global Counter... 27

Fig. 2.15 A counting example of the counter ... 27

Fig. 2.16 The schematic diagram of the detector and the tri-state buffer... 27

Fig. 2.17 The schematic diagram of the driver circuit... 28

Fig. 2.18 The state diagram with corresponding ratios ... 29

Fig. 2.19 The schematic of the 4-bit decoders... 30

Fig. 2.20 The schematic diagram of the output stage... 31

Fig. 2.21 The circuit diagram of the unit gain buffer ... 31

Fig. 2.22 The pattern input interface formed by the shift registers ... 32

Fig. 2.23 The input stage of a single pixel ... 33

(10)

Fig. 2.25 The off-chip current measuring circuit... 34

Fig. 3.1 The Chinese Characters “one”, “two”, “three”, “four”, and “five” ... 35

Fig. 3.2 The recognition rates comparison between RMCNN & ARMCNN... 36

Fig. 3.3 The Transferring curve of the V-I Converter T1 and State Resistor / Capacitor... 37

Fig. 3.4 The Transferring curve of the V-I Converter T2D and the Weighting Circuit W ... 38

Fig. 3.5 The Simulation Result of Comparator COMP... 38

Fig. 3.6 The Frequency Response of the OP-Amp that performed as unit gain buffer ... 39

Fig. 3.7 . The absolute weight generation of three correlated patterns... 40

Fig. 3.8 The absolute weight generation of three reverse correlated patterns ... 40

Fig. 3.9 Post-sim recognition result of ‘四’... 41

Fig. 3.10 Post-sim recognition result of ‘二’... 42

Fig. 3.11 Post-sim recognition result of ‘一’... 42

Fig. 4.1 The layout of one cell and two neighboring RM blocks ... 43

Fig. 4.2 The die photo of this chip... 44

Fig. 4.3 The package diagram of the ARMCNN... 44

Fig. 4.4 The setup of experimental environment... 45

Fig. 4.5 The timing diagram of the controlling signals ... 46

Fig. 4.6 Experimental result of recognition and recovery of ‘四’. ... 49

Fig. 4.7 experimental result of recognition and recovery of ‘二’... 49

Fig. 4.8 experimental result of recognition and recovery of ‘一’... 50

Fig. 4.9 The diagram of the top left 3 x 3 cell arrays ... 50

Fig. 4.10 desired ratio weights and actual ratio weights of Fig 4.9... 51

Fig. 4.11 Presim results of pattern ‘四’ for the original design and the modified design... 52

Fig. 4.10 Presim results of pattern ‘二’ for the original design and the modified design ... 52

(11)

CHAPTER 1

INTRODUCTION

1.1 Background of Cellular Nonlinear Network

The cellular nonlinear network (CNN) introduced by Chua and Yang [1]-[2] has been considered as one of the potential architecture in future nano-electronic systems. One of CNN’s important applications is the associative memory. So far, some research works on the applications of CNN as neural associative memories for pattern learning, recognition, and association have been explored [3]-[9]. As to the hardware implementation, special learning algorithm and digital hardware implementation for CNN were proposed in [8] to solve the sensitivity problems caused by the limited precision of analog weights. Also, the CMOS chip implementation of CNN associative memory was reported in [9].

The learning circuitry can be integrated on-chip with the CNN system. There are several advantages of on-chip learning: 1) No host computer is needed to perform the learning task off-line. This makes the interface of neural system chips simpler for many practical applications; 2) The spatial-variant template weights can be on-chip learned without being loaded from outside to the CNN chips. In other words, the long loading time, complex global interconnection between cells, and analog weight storage elements to perform the loading operation for large numbers of spatial-variant template weights can be avoided; 3) The adaptability to the process variations of CNN chips can be enhanced.

To implement the associative memories, both the ratio memory (RM) [10]-[22] and the generalization of Hebb’s postulate of learning [23]-[24] have been incorporated with the CNN structure to form the RMCNN with spaced-variant templates for pattern recognition. The use of ratio-memory (RM) is to enhance important ratio weights and remove less important ones

(12)

through the effect of feature enhancement [10]-[22]. In the RMCNN of [10]-[13], the input signal current is applied to the neuron throughout the recognition process and the initial state of each neuron is set to zero. In this paper, the design of the Autonomous RMCNN [27] is proposed to improve the recognition rate, which makes some modification on the Hebbian learning rule. In addition, the input signal is stored as initial state of the cell and no constant input applied to the neuron throughout the recognition process.

1.2 Review of Ratio Memory Cellular Nonlinear Network

1.2.1 Ratio memory Cellular Nonlinear Network with Elapsed Operation

In the previous work, ratio memory cellular nonlinear network (RMCNN), the cell state xij(t), its derivation

) (t

xij , and the cell output yij(t) for a regular cells can be expressed as

[1]-[2] ij j i Nr l k kl ijkl j i Nr l k kl ijkl ij ij t x t a t y t b t u z x =− +

+

+ ) , ( ∈ ) , ( C ) , ( ∈ ) , ( C ) ( ) ( ) ( ) ( ) ( 

(

)

⎪ ⎩ ⎪ ⎨ ⎧ − < − + > + + = = 1 ) ( if 1 1 ) ( if 1 1 ≤ ) ( ≤ 1 -if ) ( ) ( ) ( t x t x t x t x t x f t y ij ij ij ij ij ij

where xij(t) is the state of cell(i,j), and ukl(t) is the input of cell(k,l) in the r-neighborhood

system Nr(i, j) of the cell(i, j). In this thesis, i or k is the row number and j or l are the column number of an MxN CNN cell array. So, cell(i,j) means the ith row and jth column cell. The r-neighborhood system Nr(i, j) of the cell cell(i, j) is defined as the set of all cells including cell(i, j) and its neighboring cells, which satisfy the following property.

{

k l k l k i l j r

}

j i

Nr(, )= C( , )1≤ ≤M,1≤ ≤N, - + - ≤

The term r is called as the radius or the number of neighboring layer. In our design, r is 1. aijkl(t) is template A weight(coefficient) which correlates the cell output ykl(t) to the cell state

Eq.(1.1)

Eq.(1.2)

(13)

xij(t). bijkl(t) is the template B weight(coefficient) which correlates the cell input ukl to the cell

state xij and zij is the threshold or bias of cell(i,j).

The template B and the threshold zij are constant and space-invariant. The setting is

0 0 0 ( ) 0 1 0 0 0 0 ij t ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ B ( ) 0 ij z t =

That means the input of every cell influences itself only. In a r-neighborhood system Nr(i, j), the input of neighboring cell doesn’t influence the central cell. The threshold zij is zero

everywhere. The template A is spatial-variant and time-variant[12]-[13], and the template Aij

can be written as:

( 1) ( 1) ( 1) ( 1) 0 (0) 0 (0) (0) 0 (0) 0 (0) 0 ij i j ij iji j iji j ij i j a A a a a − − + + ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

That means only four cell are correlated to the central cell. They are up, down, left and right side cells. In the original RMCNN with elapsed operation[13]. The weights in template A can be produced by the blow equation.

1 (0) 1 P m p p ij kl T p ijkl u u dt a sum = =

∑∫

( 1) , ( 1), ( 1), ( 1) kli - j i j - i j+ i+ j

∑ ∑ ∫

= = kl m 1 p T p kl p ij P dt u u 1 um s Where p ij u

is the p-th pattern input of cell(i,j). Similarly,

p kl

u

is the p-th pattern of cell(k,l). The relationship between ij and kl is shown as Eq.(1.8) that is equivalent to . The TP is the

learning time for the RMCNN to learn p-th pattern and the total learning time for the Eq.(1.4) Eq.(1.5) Eq.(1.6) Eq.(1.7) Eq.(1.8) Eq.(1.9)

(14)

RMCNN to learn m patterns is

= = m 1 p P L T T

. aijkl is called as the ratio weight, and the

numerator of aijkl is called as the absolute-weight.

The boundary cells don’t correlate to four cells. For example, the boundary cells at corners only correlate to two cells. Thus the boundary condition of the boundary cells can be written as ) ( , ) (t 0 u** t 0 xi*j* = i j = The i*j* means this cell is a boundary cell.

This work has advantages of longer memory retention time, and the feature enhance characteristic improves the recognition rate. However, the elapsed period changes as learning patterns change, and thus the elapsed operation let the process of pattern recognition inconvenient.

1.2.2 Ratio memory Cellular Nonlinear Network w/o Elapsed Operation

Due to the inconvenience of having elapsed operation, the RMCNN w/o elapsed operation (EO) was proposed, which yields the same pattern recognition rates and simpler circuit structure as compared with RMCNN with EO. It requires no additional elapsed period to get the feature enhanced ratio weights. Indeed, the ratio weights are generated directly after pattern learning. Moreover, RMCNN can have longer template-weight storage time or equivalently pattern recognition time which is one of the advantages of RM. Due to the feature enhancement effect of the RM, which well separates the learned weights and decreases the insignificant weights to zero, more patterns can be stored and recognized in the RMCNN as compared to the CNN associative memory without RM.

The equation used to distinguish which ratio weights increase and which ratio weights Eq.(1.10)

(15)

decrease can be written as [24] ( ) 1 ( ) ( ) n aw j j Mss I t I t n = =

Eq (1.11)

whereIMss( )t is the mean of absolute memory current andIaw j( )( )t is the jth absolute memory current. IfIaw j( )( )t is larger thanIMss( )t , ratio memory current increase gradually. Otherwise the ratio weights decrease. So the increasing and decreasing ratio weights are detected. After the comparing operation, the increasing weights are set an appropriate value (1,1/2,1/3 or 1/4) and the decreasing weights are set zero directly This equation is used to determine the final ratio weights directly rather than elapsed operation. The new Hebbian learning algorithm can be written as blow:

Step 1 : find the absolute weights template A Sij(p) after p patterns are learned

( 1) ( 1) ( 1) ( 1) 0 ( ) 0 ( ) ( ) 0 ( ) 0 ( ) 0 ij i j ij iji j iji j ij i j ss p S p ss p ss p ss p − − + + ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 1 1

(

1)

( )

p p ijkl ijkl ij kl

ss

p

+ =

ss

p

+

u

+

u

+

( , )

k l can be i

(

+

1, )

j or i j

( ,

+

1)

or i

( -1, )

j or i j

( , -1)

Step 2 : find the absolute mean of the absolute weights in a template

( )

ss ijkl

M =mean

ss Step 3 : generate the ratio weights

( , ) 1 if 0 if ijkl ijkl Nr i j ijkl ijkl a ss Mss PN a ss Mss= > ⎪ ⎨ ⎪ = < ⎩ Where uijp 1 + and uklp 1 +

(16)

number of preserved weights in Nr(i,j) and r=1.

The measurement results from RMCNN w/o EO was as following. Three Chinese characters were learned: 一 , 二 , and 四 (they are one, two, and four, respectively). Unfortunately, the pattern ’四’ failed to be recognized and recovered. Further investigation into the cause of this imperfection showed that a small current influences the absolute weights on the capacitor Cw during the pattern transferring period, which would lead to wrong absolute weights and, thus, wrong ratio weights were generated and stored. Therefore, the newly proposed ARMCNN chip has corrected this mistake. In addition, the die area of the previous work RMCNN chip was considerable large (4.56mmx 3.90mm), which might suffer from the impact of process variation more significantly. Therefore, the chip area reduction and optimization is necessary.

1.3 Research Motivation and Thesis Organization

To improve the image pattern recognition rates of RMCNN, the autonomous Ratio-Memory Cellular Nonlinear Network (ARMCNN) [27] is proposed and analyzed. In the ARMCNN, the input currents of the noisy input patterns are used to pre-charge the capacitors of neurons (Cij) to produce the initial cell state voltages at the beginning of the recognition

process. After pre-charging, all the input currents are removed from the neurons. Since no B-template is used and the neuron capacitors store the initial state voltages, the proposed RMCNN is called autonomous RMCNN (ARMCNN). The ARMCNN inherits the features of RMCNN such as, feature enhancement effect and no elapsed operation (ratio weights are generated directly after pattern learned). The mathematical analysis and simulations are performed for both ARMCNN and RMCNN. It is shown that the ARMCNN has a higher recognition rate, and more number of learned and recognized patterns.

(17)

The operational principle and circuit architecture of the proposed ARMCNN are described in Chapter Two, where the prediction models of recognition rate are shown as well. Chapter Three discusses the simulation results of ARMCNN, which includes the behavior simulations using C/C++ and the transistor-level simulations using Hspice. Then in Chapter Four, layout description and measurement environmental setup are mentioned. Finally, the conclusion and future work are discussed in Chapter Five. As a demonstrative example, a resolution 9x9 ARMCNN without elapsed operation (ARMCNN w/o EO) is realized in TSMC 0.35um 2P4M Mixed-Signal technology. Both simulation and experimental results have verified the superior characteristics of the ARMCNN system.

(18)

CHAPTER 2

ARCHITECTURE AND CIRCUITRY

2.1 Operational Principle and Architecture [27]

The architecture of ARMCNN is similar to that of RMCNN [1]-[4]. The operation procedures of ARMCNN can be divided into three phases: the pattern learning phase, the ratio-weights generation phase, and the pattern recognition phase. One difference between ARMCNN and RMCNN is that in the recognition operation where the noisy pattern to be recognized and recovered is treated as the initial values of cell state voltages in ARMCNN and as the neuron input in RMCNN. The operational principle of ARMCNN is described below.

In the autonomous ratio-memory cellular nonlinear network (ARMCNN), the dynamic equations of the cell state voltage x t and its derivative ij( )

. ( ) ij x t can be expressed as

(

)

0 . ( , ) ( , ) ( ) ( ) ( ) ( ) ( ) (0) r ijkl kl ij ij C k l N i j ij kl ij ij ij ij ij ij w y t x t x t C y t f x t u R x u R ∈ = − + = − ≤ ≤

where N i j is the set of r-neighboring cells r0( , ) N i j without the cell ( , )r( , ) C i j . The r-neighborhood system N i j of the cell ( , )r( , ) C i j is defined as the set of all cells

including C i j and its neighboring cells, which satisfy the following property ( , )

{

}

( , ) ( , ) |1 ,1 ,| | | |

r

N i j = C k l ≤ ≤k M ≤ ≤l N k− + − ≤i l j r

The term r is an integer called the radius of the neighborhood layer and r equal to one in our design. Moreover, xij(0) is the initial value of the cell state voltage and has a value of uijRij

Eq.(2.1) Eq.(2.2)

(19)

(-uijRij) for a black (white) pixel and a mediate value for a grey pixel.

The coefficient aij(t) represents template A weight which correlates the cell output ykl(t)

to the cell state xij(t). The template A is spatial-variant and time-variant [18]-[19], and the

template aij(t) can be written as

( 1) ( 1) ( 1) ( 1) 0 (0) 0 (0) (0) 0 (0) 0 (0) 0 ij i j ij iji j iji j ij i j a A a a a − − + + ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

That means only four neighboring cells’ outputs are correlated to the central cell. The four neighboring cells are up, down, left, and right side cells. Since the proposed ARMCNN has the neuron capacitors to store the initial state voltages, no template B weight is used to correlate the cell input ukl to the cell state xij and the template B coefficient is set to zero. That

is to say no constant input is injected during pattern recognition phase. 0 0 0 ( ) 0 0 0 0 0 0 ij B t ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

In the learning period, assume that there are m patterns to be learned in the ARMCNN. The absolute weight sijkl can be determined by the modified Hebbian rule as [1]-[4]

1 m p p ijkl ij kl p s u u = =

Where uijp is the pixel input at ith row and jth column of the pth pattern out of m input patterns, u is the pixel input in the set of klp N i j . After learning all input patterns, the r0( , ) absolute weight sijkl is to be compared with the strongest learned weight. Those weights that

are equal to the strongest absolute weight will stay and transform into the ratio weight aijkl as

( , ) 1 ijkl Nr i j a PN =

Where PNNr(i,j) represents the number of preserved weights left. On the other hand, absoluted

weights that are less than the strongest learned weight will be set to zero and disregard. This Eq.(2.3)

Eq.(2.5)

Eq.(2.6) Eq.(2.4)

(20)

procedure determines and stores the final ratio weights between cells. The new Hebbian learning algorithm can be written as following:

Step 1: find the absolute weights template A sij(p) after p patterns are learned

( 1) ( 1) ( 1) ( 1) 0 ( ) 0 ( ) ( ) 0 ( ) 0 ( ) 0 ij i j ij iji j iji j ij i j ss p s p ss p ss p ss p − − + + ⎡ ⎤ ⎢ ⎥ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ( ) ( 1) p p ijkl ijkl ij kl s p =s p− +u u ( , ) k l can be i( +1, ) j or i j( , +1) or i( -1, ) j or i j( , -1)

Step 2: determine the global maximum value of the absolute weights in system

max( ijkl )

Gss=

ss

Step 3: compare the absolute weights with the global maximum weight. Those equal to the strongest absolute weights will stay. On the other hand, those less than the strongest absolute weights will be set to zero.

Step 4: transform the remaining absolute weights into the ratio weights

( , ) 1 if 0 if ss ijkl ijkl Nr i j ijkl ijkl a ss Gss PN a Gss= = ⎪ ⎨ ⎪ = < ⎩ Where uijp 1 + and uklp 1 +

are the input of cell(i,j) and cell(k,l) respectively. The PNNr i j( , ) is

number of preserved weights in Nr(i,j) and r = 1. The boundary cells are exceptional cases

since they do not correlate to the four neighboring cells: up, down, right, and left. They might correlate to only two or three neighboring cells. Therefore, the condition of the boundary cells can be expressed as * * ( ) 0 , ( ) 0 * * i j i j x t = u t = The notation * *

i j means this cell is a boundary cell.

Eq.(2.7)

Eq.(2.8)

Eq.(2.10) Eq.(2.9)

(21)

Table 2.1 shows some sample template A of absolute-weights and ratio-weights. It is clear that only the strongest absolute weights are set to one after comparing with the global maximum absolute weight Gss, and the others are disregarded and set to zero. With the aid of this technique, the template A suppresses the unimportant weights and enhances the significant weights to get a feature enhance template. After the comparing operation, the remaining weights are set to appropriate value (1, 1/2, 1/3 or 1/4) and the others are set to zero. This equation determines the final ratio weights directly without elapsed operation.

Absolute-weights (learned)

Absolute-weights

(enhancement) Ratio weights

# 4 4 z = 1 0 0 3 1 1 0 3 3 1 0 0 1 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 4 4 z = 0 0 0 0 0 0 0 1 0 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 4 4 w = 0 0 0 0 0 0 0 1 0 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ # 5 1 z = 1 0 0 1 1 0 0 3 1 0 0 3 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 5 1 z = 0 1 0 0 0 0 0 0 0 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 5 1 w = 0 1 0 0 0 0 0 0 0 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ # 6 2 z = 1 0 0 3 1 1 0 1 1 1 0 0 3 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 6 2 z = 0 0 0 1 0 1 0 0 0 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ 6 2 w = 0 0 0 1 1 0 2 2 0 0 0 ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦

Table 2.1 some sample template A of absolute-weights and ratio-weights

In this work, a 9x9 resolution ARMCNN is implemented and measured. Fig 2.1 shows the block diagram of the ARMCNN w/o elapsed operation (EO) and the controlling

(22)

relationship between every block. A 9x9 shift register is used to store image patterns. The patterns are generated by pattern generator and are inputted into the shift registers in series. Once an image pattern is stored in register completely, the pattern is inputted into ARMCNN w/o EO in parallel for pattern learning. After all patterns are learned and ratio-weights are generated, the ARMCNN w/o EO enters into recognition phase. The recognition result is readout through the output stage, which is controlled by two decoders: Column_Decoder and

Row_Decoder. Since there is only one pin dedicated for output readout, the state of each cell is outputted in series.

Fig 2.1 The block diagram of ARMCNN and controlling relationship between every block

Fig 2.2 The general architecture of ARMCNN

9x9 Shift registers 9x9 ARMCNN Pattern generator Output stage Decoder_R Decoder_C input patterns in parallel State voltage of every cell Selecting columns Selecting row Controlling signal input patterns in series Controlling signal Output in series

Implement on chip

(23)

The general architecture of ARMCNN w/o EO and connections between cells and RMs are shown as Fig 2.2. Each cell connects with four neighboring RMs (the UP, DOWN, RIGHT, and LEFT). Every RM stores the ratio weight between two pixels. The detailed block diagram of two neighboring cells and RM in between them is demonstrated in Fig. 2.3, where

cell(i,j) corresponds to the ith row and jth column cell, and uijp is the input voltage of cell(i,j)

of pth pattern. The block T1 and T3 are voltage-to-current converters. The block T2D is also a V-I converter except that its output is in absolute current form. Moreover, T2D can detect the sign of voltage state Vxij and stored separately. The block W uses the technique of current

mirror to generate the output current of the cell by ratio (1x, 1/2x, 1/3x, and 1/4x). The ratio current will be determined by the result of local counter, Counter_L, according to how many weights are preserved. The capacitor Cw stores the absolute weight during learning period and the resultant voltage Vcw then transfers into current form and the current comparator COMP compares Vcw with global maximum absolute weight in current. The comparator COMP is a simple current comparator which decides whether the ratio weight shall be kept. The output of Counter_L is to control the W to weight the output of each cell.

T1

T2D

W

Vref

T1

T2D

W

T3

Counter clk1 clk1 clk2 clk2 clk3 clk3 Cw COMP Counter Vref clk4 COMP COMP Vref

Cell(i,j)

RM

Cell(k,l)

u

ij

u

kl

x

ij

x

kl

c

ij

c

kl

c

ij

(24)

The operation procedures of ARMCNN can be divided into three phases: learning, ratio-weights generation, and recognition. In the learning period, clk1 is set to high and clk2 is set to low. The architecture in learning phase is shown in Fig 2.4, where cell(i,j) input voltage of pth pattern uijp is transferred into current Iuij and sent to node xij. Current Iuijp can be expressed as 1 1 1.9 ( 1.5) 1.5 1.9 0 1.5 (1.5 ) 1.1 1.5 1 p sat ij p p T ij ij p ij ij p p T ij ij p sat ij Iu when u V Gm u when V u V Iu when u V Gm u when V u V Iu when u > × − < < = = − × − < < − < .1V ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩

where GmT1 is the transconductance of V-I Converter T1. The voltage level 1.5V is defined as

zero, so the current flows the opposite direction when uijp is larger or smaller than 1.5V. If

p ij

u gets larger than 1.9V or less than 1.1V, the output current Iuijp of T1 becomes saturated and remains at Iusat. In this work, Iusat is chosen to be the minimum required current to keep

the circuit work properly and is about 6.5uA.

T1

T2D

W

x

ij Vref

T1

T2D

W

T3

Counter clk3 clk3 Cw COMP Counter Vref

Cell(i,j)

RM

Cell(k,l)

Iu

ij

u

ij

u

kl

Iu

kl

x

kl

Iy

ij

Iy

kl

Iw

ij

Iw

kl

c

kl

c

ij

c

kl

c

ij

Fig 2.4 The architecture of AMCNN in learning phase

(25)

The current Iusat flows to the node xij and converts to a state voltage p ij

Vx through the resistor Rij and capacitor Cij, which are the resistance and capacitance associated with the neuron cell(i,j). Then T2D outputs an absolute current Iyijp and a sign sign(Vxijp) according to the stage voltage Vxijp. Since the function of T2D is similar to T1 and T2D has an absolute-value circuit, the output current Iyijp and the sign(Iyijp) can be written as

2 2 1.9 ( 1.5) 1.5 1.9 0 1.5 (1.5 ) 1.1 1.5 1.1 p sat ij p p T D ij ij p ij ij p p T D ij ij p sat ij Iy when u V Gm Vx when V u V Iy when u V Gm Vx when V u V Iy when u V > × − < < = = − × − < < − < ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 0 1.5 ( ) 3V 1.5 p ij ij p ij V if Vx V sign Vx if Vx V ⎧ < ⎪ = ⎨ > ⎪⎩

where GmT2D is the trans-conductance of T2D and the current Iysat is the saturated output

current of T2D. It is designed to be around 6.5uA as well. The different between T1 and T2D is that Iyij always flows in the same direction whether Vxij is larger or smaller than 1.5V.

Moreover, the sign of Vxij is detected by a detector in T2D and sent to the block W.

According to the signs of two neighboring input voltage Vxij and Vxkl, the output current charges or discharges the capacitor Cw. W is the weighting circuit which transfers the input current Iyij into ratios: 1x, 1/2x, 1/3x, 1/4x. In learning period, the block W is set to a default

state, which multiplies Iyij by 1/4. The reason of choosing 1/4x as the default state is that it

helps to control the length of learning time to charge or discharge the capacitor Cw. In addition, the capacitor Cw is a 1.5pF poly-poly capacitor implemented on chip. The

Eq.(2.12)

(26)

capacitance value of Cw is chosen as a compromise between weight storage time and capacitor chip area. The current Iysat is chosen as the smallest current that can have the V-I

Converter operates regularly and is about 6.5uA in this work. The smaller the current Iysat is,

the better control of the value of Vwijkl on capacitor Cw can be. In this design, the learning

time of a pattern is set to 100ns.

After all patterns are learned, an absolute weight is generated and stored at the capacitor

Cw. If two neighboring cells are having positive relationship, for example if they are both in black or both in white, the capacitor Cw between them is charged. On the other hand, if they are having negative relationship, for example they are one in black and the other in white, then the capacitor Cw is discharged. Thus the voltage Vwijkl stored on Cw can be expressed as

1 ( ) 2 ( 1) 1 ( ) 2 ijkl ijkl ijkl Iysat t Vw p Cw Vw p Iysat t Vw p Cw × ⎧ + ⎪⎪ + = ⎨ × ⎪⎩ ' ij kl ij kl

when sign of Vx and Vx are the same when sign of Vx and Vx aren t the same

The voltage Vwijkl(p) stored on the capacitor Cw represents the absolute weights after the pth

pattern is learned. Since the output current of block W is set to 1

4Iysat in learning period, and there are two W blocks trying to charge or discharge the Cw at the same time, the voltage changing is 1 2 1 2 4 Iysat t Iysat t Cw Cw × ⎛= × × ⎞ ⎜ ⎟

⎝ ⎠. The time for each pattern to be learned is 100ns in this design.

The block T3 is also a V-I Converter, which transfers the voltage Vwijkl into the current

form IT3 and sends this current to a simple current mode comparator COMP. The block COMP compares IT3 with a global maximum current IMAX, which is corresponding to the

largest value of absolute weight among the system. If IT3 is equal or larger than IMAX, COMP

outputs a logic “high” signal to the local counter Counter_L, which means the ratio weight Eq.(2.14)

(27)

between the two pixels should be preserved. Otherwise, if IT3 is less than IMAX, COMP outputs

a logic “low” signal to Counter_L, which means the relation between the two pixels is not strong enough and is of no interest.

The interconnection between COMPs and Counter_L is described in Fig 2.5. Since every cell connects with its four nearest neighbor cells, there are four COMPs in one cell. Every COMP sends out a logic signal to Counter_L. At the end of learning period,

Counter_L counts how many “logic high” signals are given from the four COMPs. If there is (are) only one (two) “logic high”, that means only one (two) ratio weight should be preserved, and so on. Then Counter_L controls the W to weight the output current of T2D as

1 1 2 ij ij IyIy ⎞ ×

⎝ ⎠. Similarly, according to the total number of “logic high” signals are counted in one cell, the Counter_L may control the block W to weight the output current of T2D as

Fig 2.5 The inter-connection relationships between COMPs, Counter_L and RMs

1

3Iyij or 1

4Iyij. Moreover, the logic output of COMP controls the switch sw_COMP as shown in Fig 2.3. This behavior is known as no inter-relation between the two neighboring

(28)

cells is generated, and thus the output contribution path of each other should be cut off. In other words, the ratio weight between two pixels is zero. For example, if the logic output of

COMP between the two neighboring pixels is low, which means the ratio weight is zero, the switch sw_COMP should be turn off. Therefore, the output contribution between these two pixels is isolated in recognition period.

The ratio weight is generated as Counter_L counts up the total number of logic “high” from COMPs and controls the block W with appropriate weighting. After that, the operation enters into recognition phase. The architecture in recognition period is shown as Fig 2.6. As shown in Fig 2.6, clk1 is set to low and clk2 is set to high. The state of switches sw_COMP is controlled by COMP. A noisy image pattern is inputted to perform recognition and recovery, where noi

ij

u and noi kl

u represents the input voltage of noisy pattern of cell(i,j) and cell(k,l) respectively. They are inputted to T1 and transfer to currents noi

ij

Iu and noi kl

Iu . These currents then convert to state voltages noi

ij

Vx and noi kl

Vx through the resistor Rij (Rkl) and capacitor Cij

(Ckl). T2D converts state voltages into current Iyijnoi and noi kl

Iy . In accordance with the ratio weights generated previously, the output current of each cell is weighted as 1x, 1/2x, 1/3x,

1/4x, and 0x, and contributes to its corresponding neighbor cells. For instance, if the weight is

set to 1x, it means only one (two) of the four neighbor cells is correlated to this cell. Thus only one (two) neighbor cell will contribute its current output to the cell cell(i,j), and so on. In the case where the two neighboring cells have no correlation to each other, the COMP will output a logic “low” signal to the local counter, and this signal will turn off the output contribution path between them as well.

(29)

T1

T2D

W

Vref

T1

T2D

W

T3

Counter clk2 clk2 clk3 clk3 Cw COMP Counter COMP COMP Vref

Cell(i,j)

RM

Cell(k,l)

u

ij

u

kl

x

ij

x

kl

c

ij

c

kl

c

ij

Iu

ij

Iu

kl

Iy

ij

Iy

kl noi noi noi noi

Iw

ij

Iw

kl

Fig 2.6 Architecture of ARMCNN in recognition period

According to KCL, the dynamic equations of the cell state voltage Vx t and its derivative ij( )

.

( )

ij

Vx t can be expressed as Eq.(2.1) and Eq.(2.2). In addition to that, the weighting of output currents can be derived from the following equation:

1 1 1 1, , , , 0 2 3 4 , ( 1), ( 1), ( 1) , ( 1) kl ijkl kl ijkl Iw a Iy a or k l i j i j i j or i j = × ∈ ∈ − + − +

where aijkl is the template A ratio weight coefficient and generated by the block W. The coefficient aijkl can be 1, 1/2, 1/3, 1/4 or 0, which represents the number of preserved weights for the cell is 1, 2, 3, 4 or 0, respectively. The current Iw is the resultant current kl

that contributes to cell(i,j). It is equal to the output current Iy of neighbor cell times the kl

coefiicient wijkl.

(30)

2.2 Circuit Implementation

In this work, several circuits have been employed. The voltage-to-current converter and current weighting circuit are discussed in section 2.2.1. Then a simple current mode comparator is described in section 2.2.2. Section 2.2.3 talks about some digital components such as counter, decoder, voltage detector and driver. They are necessary for some calculation purpose. At last, the shift registers, which functions as the input pattern interface, and the output stage circuits are described.

2.2.1 V-I Converter

As shown in Fig 2.3, The block T1, T2D, and T3 are all V-to-I converters. The circuit of

T1 and state resistance Rij / capacitance Cij are implemented as Fig. 2.7, where the MOS

dimension is written next to the MOS name. The unit of MOS dimension is in micro-meter (um). In Fig 2.7, the left side of this circuit is a differential pair structure with the cascode current mirror, and the right side of circuit is the state resistor / capacitor, formed by diode load (MR1 and MR2) and MOS capacitor (Mc) respectively. The purpose of state resistors is to limit the operating range. The voltage Vb1 is a constant bias voltage, which is set to 1.5V. The reference voltage Vref is used to compare with input voltage and sets to 1.5V. If the input voltage Vin is larger than Vref, the output current Io flows from left to right (M4 Æ M6 Æ MR2) and causes the stage voltage Vxij raise to 1.9V. On the other hand, if the input voltage

Vin is smaller than Vref, then the output current Io flows from right to left (MR1 Æ M2 Æ M7)

and causes the stage voltage Vxij drop to 1.1V. In other words, the state voltage Vxij is ranged

(31)

Vb1 Vin Vref Vx VDD clk3 M3 (4/2) M4 (4/2) M5 (4/.35) M6 (4/.35) M1 (3/.35) M2 (3/.35) MR2 (2/8) MR1 (2/8) Mc (2/8) M7 (2/2) M7b (2/2) IBIAS

Fig 2.7 T1: Voltage to current converter

Fig 2.8 is the circuit of T2D block, a voltage to absolute current converter. The left side of T2D is a differential pair, which is the same as T1, and the right side of T2D is the absolute output current structure. The constant bias voltage Vb2 is 1.5V, and the constant bias voltage

Vb1 and the reference voltage Vref are the same as in T1. The operating principle of T2D is

that when the input voltage Vin is larger than Vref, the MOS M2 is cutoff and the current flows from MOS M3 through M5 and M1 to M7. The cascode current mirror (M3 ~ M6) mirrors this current to M4 and to the right side of T2D since MOS M2 is cutoff. Note that the parasitic capacitor at the source of M10 (the input of inverter) is charged to high. Consequently, the MOS M11 is shorted and M10 is cutoff. Therefore, the current Io flows through MOS M12 to

M14. The other cascode current mirror (M12 ~ M15) forces MOS M8 to flow the same

amount of current as MOS M14 does. At last the absolute output current Io,abs is mirrored from the MOS M8 to M94. Note that the switches M10 and M11 will not turn on at the same

(32)

time. This can be seen from the equation that for MOS M10 to be on: Va – Vp > Vth, that is Vp < Va – Vth. But for MOS M11 to be on: Vp – Va > Vth, that is Vp > Va + Vth. Accordingly, The switches M10 and M11 will not turn on at the same time. In addition, if the differential pair provides no current flow into or out from node Vp, both switches are off.

Fig 2.8 T2D: Voltage to absolute current converter

On the other hand, if the input voltage Vin is smaller than Vref, the MOS M1 is cutoff and the cascode current mirror (M3 ~ M6) is off. Since M1 is operating in cutoff region and the current source M7 is forcing a current of 6.5uA to flow to ground, the direction of the output current Io is from right to left. Moreover, the parasitic capacitor at the source of M10 (the input of inverter) is discharged to low. As a consequent, the MOS M10 is shorted and M11 is cutoff. The other cascode current mirror (M12 ~ M15) is off as well. A current of 6.5uA is flowing from M8 through M10 and M2 to M7. The absolute output current Io,abs is also mirrored from the MOS M8 to M94. Note that in the both cases, the absolute output current

Io,abs is flowing in the same direction whether the input voltage Vin is larger than Vref or not.

Therefore, the T2D is called a voltage to absolute current converter.

Vb1 Vin Vref M3 (4/2) M4 (4/2) M5 (4/.35) M6 (4/.35) M1 (3/.35) M2 (3/.35) M7 (2/2) M7b (2/2) IBIAS M8 (12/2) M94 (12/2) Vb1 M10 (1.4/.35) M11 (4.2/.35) M12 (4/.8) M14 (4/2) M13 (4/.8) M15 (4/2) Iy Vp Va

(33)

The weight circuit is shown in Fig 2.9, which is to generate the desired ratio of the output current from T2D. The possible current ratios are: 1x, 1/2x, 1/3x, 1/4x. In practical design, the weight circuit is directly combined with T2D to form the desired ratio. Note that the MOS M94 in Fig 2.8 and the MOS M94 in Fig 2.9 are the same. Since we wish the

Detector Detector Detector VinT2(i,j) VinT1(k,l) VinT3(k,l) clk3 clk2 M94 (3/2) m=4 M91 (3/2) m=1 M92 (3/2) m=2 M93 (3/2)+(1/2) m=1 VDD Mdummy (1.2/.35) M40 (2/2) M41 (2/2) M55 (2/2) M54 (2/.4) M53 (4/.4) M52 (4/2) M51 (4/2)

Fig 2.9 Weight: Generation of ratio current

generated current ratios to be précised, the MOS M91, M92, M93, and M94 do not use minimum length so as to avoid the impact of channel length modulation. The four current paths are controlled by the circuit DRIVER. There is at most one path flowing to the MOS

M40 at a time and the other three paths are conducting to the ground through a dummy MOS Mdummy. Note that during the period of pattern transferring, all four paths are conducted to

the ground through the MOS Mdummy to ensure no charging / discharging behavior toward the capacitor Cw. The use of Mdummy and DRIVER corrects the mistake made by the previous design. The upper part of weight circuit is a sign detector, which detect the sign of state voltages of the neighbor neuron and itself. If two neighbor neurons are having the same

Sign Detector

(34)

sign, then XOR outputs a logic low to turn on the MOS M52 and M53 and turn off the MOS

M54 and M55. This will charge the capacitor Cw. On the other hand, if two neighbor neurons

are having different signs, XOR outputs a logic high to turn on the MOS M54 and M55 and turn off the MOS M52 and M53. This will discharge the capacitor Cw. The detailed circuitry of Detector and DRIVER will be described in section 2.2.3.

2.2.2 Comparator (with T3)

The V-I converter T3 is the same as T2D except that T3 is followed by a current mode comparator and T2D is followed by a weight circuit. The schematic diagram of T3 with a current mode comparator is shown in Fig 2.10. The output of the V-I converter T3 is sent directly to the comparator COMP. The comparator we choose here is a simple current mode

Fig 2.10 The schematic of V-to-I converter T3 and current mode comparator COMP

comparator. The reason we go for simple structure is to save the area of the whole chip. Its operating principle is as following, if the output current of T3, IoT3, is larger than or equal to

V-I Converter

(35)

the global maximum current, Imax, which means the absolute weight should be preserved, the logic output of COMP is high. On the other hand, if the output current of T3 is smaller than the global maximum current Imax, which means the absolute weight is of no significance, the logic output of COMP is low. Since we want to achieve that the output of COMP is high if

IoT3 is equal to Imax, the sizes of Mn2 and Mn4 are designed to be a little smaller than Mn1

and Mn2. Doing so makes the logic output is high even if IoT3 equals to Imax.

Mc3 (2.2/1) Mc4 (2/1) Mc1 (2.2/.4) Mc2 (2/.4) IREF IT3

Fig 2.11 The dimension of the current mode comparator COMP

2.2.3 Digital Components

In the ARMCNN system, there are also some digital components being employed to achieve desired functions. For instance, the local counter Counter_L is to count up the total number of preserved weights in a cell. The global counter Counter_G is to control the switches sw1 ~ sw6, which is described later in this section. The voltage detector Detector is to detect the value of state voltage and output a logic signal (low or high). The last one is the weight selection circuit Driver. Driver decides which of the four current paths are conducted. The decision made is depending on the 2-bit logic output of the local counter Counter_L. In this section, all digital circuitries are discussed one by one in detail.

(36)

Fig 2.12 The schematic diagram of the counters in this chip

Counter

In Fig 2.13, the counters Counter_L and Counter_G are both formed by two asynchronous reset flip-flops as shown in Fig 2.12. The schematic diagram of the asynchronous flip flop is demonstrated in Fig 2.14. Instead of using digital flip-flop (DFF), using asynchronous reset flop-flops can ensure correct function under slow operating speed. In addition, it does not have the static power consumption problem as DFF does. The switch S_en enables the counting operation and it can be described in Fig 2.15, where CLK is clock signal and RST is reset signal. If the signal RST is set to low, b0 and b1 are always low. The signal S_en must set to high during the counting operation. If not, b0 and b1 do not change even if CLK is oscillating. Note that signal b0 represents the logic output of the counter’s LSB bit while signal b0_bar represents its compliment and signal b1 represents the logic output of the counter’s MSB bit while signal b1_bar represents its compliment.

(37)

Fig 2.14 (a) Local Counter (b) Global Counter

There are six switches used in the local counters and they are controlled by the combination logic from the global counter. This can be simplified by using only four switches in the local counters. The four outputs, “b0, b0bar, b1, b1bar”, can be used to control the four switches in the local counters. This not only reduce the number of switches used, but also simplified the layout routing.

(38)

Detector

The detector is used to detect the voltage level of each state voltage and transform to a logic signal (either high or low). As a result, the logic signal can be handled by the combinational logic. The detector is formed by a tri-state element and a simple inverter as shown in Fig 2.16. Since the input voltage to the detector is an analog signal ranging from 1V

~ 2V, a simple inverter train structure will lead to constant current leakage because both the

PMOS and the NMOS are on at the same time for the first inverter. Therefore, the use of tri-state buffer avoids the problem of the constant current leakage but at the expense of an additional controlling signal VB1.

Fig 2.16 The schematic diagram of the detector and the tri-state buffer

Driver

The driver circuit is to control the current ratio paths of the weight circuit. It allows only one path to charge / discharge the capacitor Cw while the other three paths are conducted to the ground. In addition, during the pattern transferring period, the capacitor Cw should not be charged / discharged. Therefore, the driver circuit should conduct all four paths to the ground. This can be done by ORing the signals clk1 and clk2. The signals clk1 and clk2 represent the learning period and the recognition period respectively. Consequently, ORing clk1 and clk2

(39)

outputs a logic signal control and it is true when either of them is logic high, which means the signal control will be high during learning and recognition only. The signal control will be low during the pattern transferring period. Please refer to Fig 2.17.

Fig 2.17 The schematic diagram of the driver circuit

A NAND gate generates a logic low only if its inputs are all low. The three inputs of the NAND gate are: D1, D0, and control. The signal control is described earlier and the signals

D1 and D0 are the output result of the local counter Counter_L. The combination of the

signals D1 and D0 and their compliments decides which of the four ratio paths are conducted to the capacitor Cw while the other three ratio paths are conducted to the ground. Note that the logic output D is to control the path to the capacitor Cw, and logic output DB is to control the path to the ground.

As shown in Fig 2.18, the default state (D1, D0: 11) of the weight circuit is set to Iyij

multiplies by 1/4. All states returns to default state if the signal reset is triggered. According to the output of the counter, the state is changed as following:

If counts one, the state changes to (D1, D0: 00), with the ratio set to Iyij by 1.

If counts two, the state changes to (D1, D0: 01), with the ratio set to Iyij by 12.

If counts three, the state changes to (D1, D0: 10), with the ratio set to Iyij by 13.

(40)

Fig 2.18 The state diagram with corresponding ratios

Decoder

Since there is only one output pin used in this work, two 4-bit decoders,

Decoder_Row and Decoder_Column, are

implemented to control the switches of the output stage. It allows only one of the state voltages of the 9x9 neurons to output at a time. Decoder_Column controls column switches SWC11 ~ SWC19 (SWC21 ~ SWC29,

SWC31 ~ SWC39 …etc.) while Deocder_Row

controls row switches SWR1 ~ SWR9. This

structure allows every pixel to be read out one by one. The schematics for both decoders are the same. As shown in Fig 2.19, the decoder is formed by nine 4-input AND gates and four inverter-string type of buffers.

(41)

2.2.4 Output Stage and Input Pattern Interface

In this work, the circuit of the output stage is used because there is only one pin available for output. Therefore, the 9x9 neurons (total 81 neurons) share the same output pin. The schematic diagram of the output stage is shown in Fig 2.20, where the nodes x11 ~ x99 are the

node xij in Fig 2.3. The MOS M11 ~ M99 perform as level shifters to drive the parasitical

capacitance of the switches and metal line. The switches SWC11 ~ SWC99 and SWR1 ~ SWR9 are

controlled by Decoder_Column and Decoder_Row respectively. The state voltage of each pixel is readout one by one. The arrow with a circle enclosed represents a current source. In the output stage, all pixels share the same current source. Since only one pixel is conducted at a time, one current source is enough for this design. Furthermore, using few current sources saves more power consumption. The unit-gain buffer is a negative feedback OP-amp and is used to drive the loading of the output pad. The circuit of the unit gain buffer is described in Fig 2.21.

Fig 2.20 The schematic diagram of the output stage

(42)

80uA

VDD

Vin Vout

(80/0.7) (80/0.7) (80/0.7) (80/0.7)

Fig 2.21 The circuit diagram of the unit gain buffer

In this work, since we have to input any arbitrary learning patterns, the shift registers are used as pattern input interface. As shown in Fig 2.22, each block represents a static flip-flop. The operation of learning period is demonstrated here. In the beginning of the learning period, the control signals clk3 and newp are turned on and the node ptn inputs the learning patterns pixel by pixel. After the clock of flip-flops, DFF, oscillates nine times (because the cell array is nine), the signal pin turns on to input the learning pattern into each pixel. Before the signal

pin turns on, the signal newp turns off to prevent the pattern changes due to a glitch. After the

pattern is learned, the operation above is repeated again so as to input the next pattern. Depending on the total number of patterns to be learned, the operation above is repeated until all patterns are learned.

D Q Clk Reset Q D Q Clk Reset Q Uij pin newp clk3 noi R Vnoi D Q Clk Reset Q D Q Clk Reset Q Ui9 clk3 noi R Vnoi Ui1 Uij-1 Uij+1Ui8 ptn DFF reset CUin CNoi

(43)

Fig 2.23 (a) is one part of Fig 2.22, which is the input stage of single pixel. Fig 2.23(b) shows how to mix the innocent learning pattern with noise in recognition period. The capacitor CUIN is an embedded poly-poly capacitance with magnitude 0.45pF and the

capacitor CNOI is also an embedded poly-poly capacitance with magnitude 0.1pF. In the

learning period, the capacitor CNOI is pre-charged to a voltage level VNOI and the control signal

noi always turns off. In the beginning of the recognition period, the noisy pattern is inputted

to perform recognition and recovery. This is done by storing the innocent learning pattern in the shift registers first, then turning off the signal clk3 to isolate each static flip-flop. While the signal noi turns on, the behavior of charge sharing occurs between the capacitors CUIN and

CNOI. As a result, the resultant voltage on the node Uij can be achieved by adjusting the

capacitance ratio of CUIN and CNOI.

D Q Clk Reset Q D Q Clk Reset Q Uij pin newp clk3 noi R Vnoi Uij-1 ptn1 DFF reset CUin CNoi

C

Noi

C

Uin

0.1pF

0.45pF

Noisy Pattern Generation

Fig 2.23 (a) The input stage of a pixel (b) The structure used to mix the pattern with noise

2.2.5 Circuit for Global Maximum Absolute Weight Determination

The preservation of the absolute weight is determined by comparing it with a global maximum absolute weight. If the weight is less than the global maximum absolute weight, the weight is set to zero. Otherwise, the weight will be preserved. As a consequent, a circuit is designed to determine the global maximum absolute weight. The global maximum absolute

(44)

weight is in current form since the comparator COMP is a current comparator. This circuit is a replica of the cell we used in the system, except it’s a simplified version which consists circuits that determined the maximum current. The circuits are: the V-I converter T2D, the current weighting circuit W and the capacitor Cw.

Fig 2.24 The circuit to determine the global maximum absolute weight

To generate the absolute weights, we set VIN = 3V (or 0V) and turn the switch Msw on

and off three times. The resultant voltage stored on capacitor Cw (Vcw) is the global maximum absolute weight and can be readout through a unity-gain buffer to pad Vcw. The voltage Vcw then converts into current IT3 through block T3. In measurement, an Opamp

together with a resistor to form a close loop can be used to measure the current IT3,

3 3 T T V Vref I R − =

Fig 2.25 The off-chip current measuring circuit

T2D

W

T3

clk1

Cw

Vref

clk4

I

T3

V

IN

Vc

w

pad

pad

pad

Eq.(2.16)

(45)

CHAPTER 3

SIMULATION RESULTS

3.1 Behavior Simulation Results [27]

Base upon the mathematical equations of a 9x9 resolution ARMCNN system, behavior simulations are performed using the program C/C++. One hundred noisy patterns of a character are generated for a fixed standard deviation of noise level. The recognition rate (RR) of a group of m patterns is defined to be the number of successful recognitions divided by 100

x m at a fixed standard deviation of noise level. The patterns for learning are Chinese

characters “one”, “two”, and “four”, which are shown in Fig 3.1. The recognition rates of both the traditional RMCNN and autonomous RMCNN are compared in section 3.1.1.

Fig 3.1 The Chinese Characters “one”, “two”, and “four”

The simulated recognition rates (RR) versus the standard deviation of noise level are shown in Fig 3.2. It can be seen from Fig 3.2 that the proposed ARMCNN can recognize at most four patterns. If five patterns are learned and recognized, the RR drops to zero. The RR of ARMCNN is slightly increased as the number of patterns to be recognized is decreased. In

(46)

comparison with the traditional RMCNN, the RR of ARMCNN is much greater than that of RMCNN for σ >0.3.

Fig 3.2 The recognition rates of recognizing Chinese characters ONE, TWO, FOUR, and at different input approaches.

3.2 Hspice Simulation Results

The simulation of V-I Converter T1 and the state resistor / capacitor (Rij / Cij) is shown in Fig 3.3. The tail current is chosen as the minimum current that can have the operation function properly. In this work, the tail current is set as 5uA. The transferring curve shows that the input voltage of T1 is linear between 1.2V ~ 1.8V. If the input voltage range of T1 exceeds this range (i.e. smaller than 1.2V or larger than 1.8V), the output voltage of T1 is saturated (i.e.

1.2V or 1.8V, respectively). As a consequent, the voltage level 1.8V (1.2V) is defined as +1 (-1). The output voltage is then used as the input voltage for T2D.

(47)

Fig 3.3 The Transferring curve of the V-I Converter T1 and State Resistor / Capacitor

Fig 3.4 describes the simulation results of V-I Converter T2D. Since the output of T2D is in the form of the absolute current, the flowing direction of the output current is the same whether the input voltage of T2D is larger or smaller than 1.5V. In addition, the transferring curve is symmetric at voltage 1.5V. Note that the output current of T2D and the output result of the current multiplied by one in weighting circuit are of a little voltage difference. This is due to the effect of different Vds seen by the PMOS M8 in T2D and the PMOS M94 in W block.

Fig 3.4 The Transferring curve of the V-I Converter T2D and the Weighting Circuit W

Input Range

for T2D

(1.2V ~ 1.8V)

1/4x I

SAT

1/3x I

SAT

1/2x I

SAT

1x I

SAT

數據

Table 2.1 shows some sample template A of absolute-weights and ratio-weights. It is  clear that only the strongest absolute weights are set to one after comparing with the global  maximum absolute weight Gss, and the others are disregarded and set to zero
Fig 2.1 The block diagram of ARMCNN and controlling relationship between every block
Fig 2.5 The inter-connection relationships between COMPs, Counter_L and RMs
Fig 2.6 Architecture of ARMCNN in recognition period
+7

參考文獻

相關文件

Although it is one of his early writings in Taiwan, it has reflected thoughts on stylistic reform and religious reform.“Singing in Silence"calls for religious reform, and is

For periodic sequence (with period n) that has exactly one of each 1 ∼ n in any group, we can find the least upper bound of the number of converged-routes... Elementary number

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

One could deal with specifi c topics for researching on Buddhist Literature while one has to clarify the categories and analyze the problems of methodology to construct “History

Quadratically convergent sequences generally converge much more quickly thank those that converge only linearly.

“Find sufficiently accurate starting approximate solution by using Steepest Descent method” + ”Compute convergent solution by using Newton-based methods”. The method of

denote the successive intervals produced by the bisection algorithm... denote the successive intervals produced by the

If growing cities in Asia and Africa can provide clean, safe housing, the future of the people moving there should be a very good one... What is the main idea of the