A learnable cellular neural network structure with ratio memory for image processing

(1)

tecture, the RMCNN as the associative memory can generate the absolute weights and then transform them into the ratioed A-template weights as the ratio memories for recognition of noisy input patterns. It is found from simulation results that due to the feature enhancement effect of RM, the RMCNN under constant leakage on template coefficients can store and recognize more patterns than the CNN associative memories without RM, but with the same learning rule and the same constant leakage on space-variant template coefficients. For 9 9 (18 18) RMCNNs, three (five) patterns can be learned, stored and recognized. Based upon the RMCNN architecture, an experimental chip of CMOS 9 9 RMCNN is designed and fabricated by using 0.35 m CMOS technology. The measurement results have successfully verified the correct functions of RMCNN.

Index Terms—Cellular neural network (CNN), divider, multi-plier, ratio memory (RM).

I. INTRODUCTION

D

UE to the advantageous feature of local connectivity, the cellular neural network (CNN) introduced by Chua and Yang [1] is very suitable for very large-scale integration (VLSI) implementation and thus enables many applications [2], [3]. So far, many research works on the applications of CNNs as neural associative memories for pattern learning, recognition and association have been explored [4]–[10]. Among them, many innovative algorithms and software simulations of CNN associated memories were reported [4]–[8]. As to the hard-ware implementation, special learning algorithms and digital hardware implementation for CNNs were proposed in [9] to solve the sensitivity problems caused by the limited precision of analog weights. Moreover, CMOS chip implementation of CNN associative memory was also reported in [10].

In realizing CNN associative memories, the learning circuitry can be integrated on-chip with CNNs. The major advantages of on-chip learning are: 1) no host computer is needed to perform the learning task off-line. This makes the interface of neural system chips simple for many practical applications; 2) the space-variant template coefficients can be on-chip learned Manuscript received February 3, 2001; revised July 12, 2001. This work was supported by the National Science Council of the Republic of China, under Grant NSC89-2218-E-009-064. This paper was recommended by Associate Ed-itor P. Szolgay.

The authors are with the Integrated Circuits and Systems Laboratory, Department of Electronics Engineering, National Chiao Tung University, Hsinchu City, 30050 Taiwan, R. O. C. (e-mail: cywu@alab.ee.nctu.edu.tw; p8611828@alab.ee.nctu.edu.tw).

Digital Object Identifier 10.1109/TCSI.2002.805697

can be avoided; 3) the adaptability to the process variations of CNN chips can be enhanced.

The ratio memory (RM) of Grossberg outstar structure [11]–[13] has been used in both feedforward and feedback neural network ICs for image processing [14]–[17]. It is found that the RM in neural network ICs has the advantages of long memory time and image feature enhancement under constant leakage on stored weights [14]–[17].

In this paper, both RM and modified Hebbian learning function [18] are implemented in the CNN structure with space-variant templates and constant leakage on stored template coefficients [19] for pattern learning, storing and recognition. The proposed CNN with RM is called the RMCNN. It has the advantages of on-chip learning as mentioned above. Since most of the on-chip learning circuits can be shared with both RM and CNN core circuits, the extra chip area required for on-chip learning circuits is small. Moreover, the RMCNN can have longer template-coefficient storage time or equivalently pattern recognition time which is one of the advantages of RM. Due to the feature enhancement effect of the RM which well separates the learned weights and decreases the insignificant weights to zero, more patterns can be stored and recognized in the RMCNN as compared to the CNN associative memory without RM, but with space-variant template coefficients, the same constant leakage on template coefficients and the same learning rule. As a demonstrative example, a 9 9 RMCNN is realized in CMOS technology. Both simulation and experimental results have verified the advantageous characteristics of the RMCNN. In Section II, both model and architecture of the RMCNN are described. In Section III, the detailed CMOS circuit design is presented. In Section IV, both MATLAB and HSPICE sim-ulation results are demonstrated to verify the correct functions of the RMCNN. The measurement results are presented in Sec-tion V. Finally, the conclusion is given.

II. MODEL ANDARCHITECTURE

In a CNN, the cell state , its derivative and the cell output for a regular cells can be expressed as [1]–[3]

(1)

(2)

if if if

(2)

where is the cell output from the cell in the -neighborhood system of the cell and is the cell input from the cell of . In an

CNN cell array, the -neighborhood system of the cell is defined as the set of all cells including and its neighboring cells, which satisfy the following property:

(3) where is an integer and called the radius or the number of neighborhood layer. In (1), is the coefficient or weight of template which correlates to , is the coefficient or weight of template that correlates to and is the bias or threshold of the cell . In many applications of CNNs, the template coefficients of , and templates are usually constants, being independent of time and cell position. However, these template coefficients can be time-variant or space-variant in general. In (2), the output func-tion called the ramp function, is a nonlinear function which limits the maximum absolute output to be 1.

To incorporate the learning capability into a CNN, its struc-ture has to be modified to realize a learning rule and variable templates. In this work, the Hebbian rule for unsupervised learning [19] is adopted with the necessary modification to accommodate the local connectivity of CNNs. Suppose there are exemplar patterns to be learned in a learnable CNN. The learned weight at in the learning period can be determined according the modified Hebbian rule by the summation of all products of two activations or outputs from two correlated pixels or nodes. Thus, can be written as (4)

where is the pixel activation or output of the cell at th row and th column of the th pattern out of input patterns with the normalized value between 1 and 1, is the pixel activation or output of the cell in the set of and is the set of -neighboring cells without the cell . The learned weight is then transformed into the ratioed weight [14]–[17] as

(5)

where is a constant. With the ratioed weight in (5), the RM[14]–[17] can be realized in a CNN. The resultant CNN with modified Hebbian rule and RM is called the RMCNN. In the RMCNN, the cell state is written as

(6)

where is the input of the patterns to the RMCNN for pro-cessing and is a constant.

As compared with the original CNN cell state in (1), the template coefficient can be expressed as

(7)

where is a constant. Note that the template is time variant and space variant. For a RMCNN with and four nearest neighboring cells, the learned template of the cell at just after the learning is denoted as and can be expressed as

(8)

(9) (10) where is the learning time for the RMCNN to learn th pattern and the total learning time for the RMCNN to learn

patterns is .

On the other hand, the template can be written as

for (11)

where is a constant. As may be seen from (11), the template is a static and space-invariant matrix. Similarly, the threshold template is also a space-invariant and static tem-plate. The boundary conditions of the boundary cells in the RMCNN can be written as

(12) where denote the boundary cells. The initial state of the RMCNN is set as

(13) The architecture of the RMCNN is shown in Fig. 1 where the RM is used to realize the - template weights among the cells and there are only four nearest neighboring cells. The detailed block diagram of two neighboring CNN cells and their RM in the RMCNN is shown in Fig. 2. In Fig. 2, the block T1 is a V-I converter used to convert the voltage of input patterns into current. The current of input patterns is summed with the four weighted outputs from neighboring cells during the recognition period and converted into voltage through the resistor and the parasitic capacitor to form the cell state . The block T2d is a V-I converter with one-half absolute-value circuit and sign-detection circuit to generate the absolute value of output current and detect the signs of , respectively. The CNN cell is formed by T1, T2d, , and as indicated in Fig. 2.

(3)

Fig. 1. The architecture of the RMCNN.

Fig. 2. The detailed architecture of two neighboring cells and their RMs in the RMCNN.

The block Mul/Div [20] in Fig. 2 is a combined four-quad-rant multiplier and two-quadfour-quad-rant divider circuit. It is used to perform the multiplication of (4) and realize the modified Heb-bian learning rule during the learning period. It is also used to realize both RM in (5) and multiplication of in (6) during the recognition period. The resultant absolute weight during the learning period is stored in the capacitor . The block T2l transfers the absolute value of the voltage stored in to and stores its sign in the latch circuit. The resistor in parallel with represents the inevitable leakage associated with . The block T3 is also a V-I converter to convert the voltage of into current during the recognition period. The output current of T3 is sent to the sum block to perform the summing function with the currents from the other three neigh-boring cells. The summed current is sent to the Mul/Div block for the ratio-memory generation. The above circuits form the RM among CNN cells.

During the learning period, with and , the configuration of RMCNN is shown in Fig. 3. In Fig. 3, the input patterns are read sequentially into the cell and the input voltage of the th input pattern is sent to T1 to con-vert into the current and then to T2d to extract its abso-lute current value and sign. Then, the converted absolute

Fig. 3. Architecture the RMCNN during the learning period.

currents and of two neighboring cells are sent to the four-quadrant multiplier in the Mul/Div block to generate the signed product. The generated product in the current mode charges the capacitor for the period to form the voltage on . This operation is repeated for patterns to sum the voltages on . Finally, the weight voltage stored on at when the learning period ends can be written as

(14) where is the current of the pixel at th row and th column of the th pattern out of input patterns, is the current of the input pattern to the cell of neighboring cells, is a constant bias current, is the weight voltage stored on at s and is the learning time of each learned patterns. Comparing (14) and (4), it can be re-alized that the learned weight of the modified Hebbian rule is realized by (14). Through T2l, the absolute value of the weight denoted as is stored on the capac-itor , whereas the sign of is stored in the latch circuit of T2l.

In the elapsed period, the leakage current associated with gradually decreases of . Since the leakage current is nearly constant, the change of

can be written as

(15) In the recognition period with and , the architecture of RM is shown in Fig. 4. In Fig. 4, the voltage of the th pattern to be recognized is input to T1 and converted into the current . The absolute weight voltage stored on is converted into the current through T3 and summed with the currents from other neighboring cells. The summed current, the current and the cell output current are sent to the Mul/Div block to obtain the current corresponding to in (6), which is then summed with the currents from other neighboring cells, the input current and the threshold current to form the cell state current .

(4)

Fig. 4. The architecture of the RMCNN during the recognition period.

The current is converted into the voltage

through the resistor . Thus, can be expressed as shown in (16), at the bottom of the page, where is the empirical gain. Ideally . The ratioed weight in (16) is gener-ated by the two-quadrant divider in the block Mul/Div with its sign equal to the sign of latched in T2l whereas the multiplication of and the ratioed weight is generated by the four-quadrant multiplier of Mul/Div by using the latched sign of and the sign of in T2d. Comparing (16) with (6), it can be seen that the function of (6) is realized in the architecture of Fig. 4.

The generated is sent to T2d to generate the current and as shown in (17)–(18), at the bottom of the page, where the is the transconductance of T2d. It can be seen from (2), (17) and (18) that the block T2d realize by separating its magnitude and sign. The sign is detected in the block T2d and its voltage

is .

Generally, the learned template matrix is asymmetrical. According to the simulation result, the learned template leads to stable behavior. The above-mentioned stability will be for-mally proved later.

III. CMOS CIRCUITREALIZATION A. V-I Converters and Sign Detectors

The CMOS circuits of T2d and T2l are shown in Fig. 5 where Fig. 5(a) shows the circuits of V-I converter with the one-half absolute-value circuit. The V-I converter which is also used in the blocks T1 and T3 is a CMOS differential amplifier M1 M7 with the source resistance to increase the linear range. The two source resisters are realized by M5 and M6 devices oper-ated in the linear region with the gate bias voltage Vbvic1. The output current Iovic is sent to the one-half absolute-value cir-cuit formed by M8 M13 to generate the absolute-value cur-rent Ioabs with the unified flow direction. In Fig. 5(a), Vbvic1, Vbvic, Vbabsn, and Vbabsp are constant bias voltages.

The sign of is detected and latched by the CMOS dynamic latch circuits of Fig. 5(b) in the block of T2l whereas the sign of the input voltage is detected by the four cascaded CMOS inverters in the block of T2d and its output voltage is denoted as in Fig. 5(c). When the input signal or is larger than the inverter threshold voltage (1.5 V), the output of the latch circuit in Fig. 5(b) or the detect circuit in Fig. 5(c) is high (3 V). Otherwise. The circuit output becomes low (0 V) when the input signal or is smaller than the threshold voltage (1.5 V). To avoid the effect of the inverter threshold-voltage variations, the input signal levels are kept well separated from the threshold voltage.

In the learning period, is High (3 V) and is low (0 V) in Fig. 5(b). The signs of the input voltages and are detected by the circuit of Fig. 5(c) in T2d and used to determine the sign of in (4) whereas the sign of the voltage is detected by the circuit of Fig. 5(b). In the recognition period, the sign of or equivalently the sign of denoted as in Fig. 5(b) is further latched by setting low (0 V) and high (3 V). The latched sign is used in generating the first term in (16).

Fig. 6 is the HSPICE simulation result of the V-I converter with the one-half absolute-value circuit, which is designed by using 0.35 m single-poly quadruple-metal (SPQM) N-well CMOS technology. It can be seen from Fig. 6 that

(16) if if if (17) V if V V if V (18)

(5)

(a) (b)

(c)

Fig. 5. (a) The circuit of V-I converter of the blocks T1 and T3 and the one-half absolute-value circuit used in the blocks T2l and T2d. (b) Latch circuit used in the block T2l. (c) The sign detector circuit used in the block T2d.

Fig. 6. The HSPICE simulation result of the circuit in Fig. 5(a).

the voltage Vin of the cell state is converted into positive current Ioabs. The maximum linearity error of Ioabs is 15% at Vin Vref V. It is found that this error is acceptable in the RMCNN.

B. Combined Analog Multiplier and Divider

The combined four-quadrant analog multiplier and two-quad-rant divider in Fig. 2 can be realized in the current mode by the CMOS circuit shown in Fig. 7 [20]. In Fig. 7, the currents and for multiplication are input through the pMOS current sources M14i/M14 and M15i/M15/M16, respectively, whereas the current as the divider is input through M24i/M24. The parasitic vertical PNP bipolar junction transistors (BJTs) Q1, Q2, Q3, and Q4 are adopted to perform the functions of

mul-Fig. 7. CMOS circuit of the block Mul/Div.

tiplication and division by using the relation between emitter current and base-emitter voltage as

or (19)

where is the emitter saturation current and is the thermal voltage. The OP AMP Ao has a closed-loop feedback via the nMOS device M21. Thus, the emitter voltage and are virtually the same. With the buffered direct injection circuit [21], the output current can be readout through the pMOS current mirrors M19, M25 and M26, and the nMOS current mirror M29 and M30, to form the output current .

(6)

Since , we have

(20) Using the equation in (19), the relation among , , , and can be obtained from (20) as

(21) Neglecting the base currents, the output current can be ex-pressed in terms of , , and as

(22) In (22), only the magnitudes of the input current signals are used to form the magnitude of the output current signal. The sign of should be determined to realize the complete function of four-quadrant multiplier and two-quadrant divider. In Fig. 7, the signal “selpn” is used to determine the sign of the output current . The signal “selpn” is obtained from theXNORgate with the three different input signs. In the learning period, is high and is low. The output “selpn” is determined by the sign voltages and from the block T2d to realize the sign of as in (14). In the recognition period, is low and is high. “selpn” is determined by the voltage VSY and VSW from the block T2d and T2l, respectively, to determine the sign of in (16). If the signal “selpn” is high (low), the sign is negative (positive) and the MOS device M28 (M27) is turned on to make Iomd .

The BJT devices used in Fig. 7 are the parasitical vertical BJT in the 0.35- m N-well CMOS process. The current gain of the parasitical BJTs is about 6 17. It is not large enough to neglect the effect of the base currents of the BJTs Q3 and Q4 to the emitter currents of the BJTs Q1 and Q2, respectively. Thus, extra circuits are needed to further bypass the base currents from entering the emitters of Q1 and Q2. In Fig. 7, the BJTs Q13 and Q24 have the same emitter currents and as Q3 and Q4, respectively. Thus, Q13 (Q24) has the same base current as Q3 (Q4). The current mirror circuits M17/M18 (M22/M23) are used to mirror the base current of Q13 (Q24) to Q3 (Q4). Thus the base current of Q3 (Q4) is bypassed from Q1 (Q2) and the relation and can be more accurately maintained to realize (22).

In the learning period, the Mul/Div block functions as a multiplier to implement the multiplication function . The HSPICE simulation results of the multiplier function of the Mul/Div circuit in Fig. 7 are shown in Fig. 8 where the device parameters of 0.35 m SPQM N-well CMOS technology are used. It is found that in the actual operation range of from 0.5 A to 6 A and from 1.2 A to 6 A with kept at 20 A, the multiplication error can be kept under 5.5%. In the recognition period, since the gain in (8) is chosen as 4 in IC design, most of the from the neighboring cell is kept at the maximum absolute value as in (17). Thus, most of the corresponding input current of the Mul/Div block becomes a constant current and the Mul/Div block is functioned as a divider. The HSPICE simulation results of the divider function of the Mul/Div circuit in Fig. 7 are shown in Fig. 9. In the actual operation range of from 1.2 to 6 A and from 0.3 to 6 A with kept at 6 A, the output current can be as high

Fig. 8. HSPICE simulation results of the Mul/Div circuit withI = 20 A.

Fig. 9. HSPICE simulation results of the Mul/Div circuit withI = 6 A.

Fig. 10. The CMOS readout circuit for the cell state signalXij.

as 60 A. Under the condition which is the actual operation condition, the division error can be kept under 10%.

The above errors of the Mul/Div circuit are also dependent on the variations of device parameters. However, it is found from simulation results that these errors have insignificant effects on the operation of RMCNN because of on-chip learning and RM operation.

C. The Complete Circuit

By using the above CMOS circuits as building blocks, the architecture of the RMCNN in Fig. 2 is implemented with the array size of 9 9. In the implemented 9 9 CMOS RMCNN, the capacitors and for absolute weight voltage storage are realized by the nMOS gate capacitors. Because of the current-mode output signals of the blocks in Fig. 2, the summing and distribution block is realized by directly connecting the output nodes of the related blocks to the input of the master stage in a CMOS current mirror to perform the summing function and the mirrored output current is distributed out through the multiple slave stages.

A layer of the boundary cells is designed to surround the 9 9 regular cell array. In the boundary cells, both state

and input are zero. Thus, the output of the boundary cell is also zero. Since the boundary cells have to send

(7)

(c)

Fig. 11. The correct patterns of Chinese characters (a) “One.” (b) “Two.” (c) “Four.”

a zero signal voltage to the neighboring regular cells or other boundary cells, it can be realized by setting the weights from boundary cell to other cells to be zero. Thus, the associated RM blocks can be removed.

To readout the neuron state signal , suitable readout cir-cuit shown in Fig. 10 is designed in the 9 9 CMOS RMCNN. In the readout circuit, the inputs of nMOS-input CMOS single-stage OP AMPs used as unity-gain buffers are connected to the node in Fig. 2. The buffer output is connected to the input of the source-follower driver through the switch controlled by the column select control signal . In the readout operation, is raised to high column by column so that is sent to the input of nMOS source follower M31 and M32 with M32 biased by as the current source. Through the source fol-lower, the neuron state signal can be readout column by column to the output pad and the large off-chip load.

IV. SIMULATIONRESULTS A. 9 9 RMCNN

The MATLAB software is used to simulate the behavior of the RMCNN as an associative memory. In the MATLAB simu-lation, 9 9 neurons are used to form the RMCNN with . Thus, it can process patterns with 81 pixels. To consider the leakage current effect, a constant leakage current of 0.8 fA is applied to the capacitor of 2 pF so the voltage is decreased as in (19). The 2-pF capacitor is implemented on the chip. The value of 2 pF is chosen as a compromise between weight storage time and capacitor chip area. The test patterns applied for learning and recognition are the patterns of Chi-nese characters “one,” “two,” and “four,” as shown in Fig. 11. The learned templates in (8) are space-variant templates. In Table I, one of the learned template of the cell C(4,4) at s denoted as s , is listed with the corresponding learned matrix s of the absolute weights. Due to the leakage current, both absolute and ratioed weights are changed with time as shown in Fig. 12(a) and 12(b), respectively. For the

ratioed weights, the decreasing absolute weights lead to the ef-fect of feature enhancement [14]–[17] which makes the smaller (larger) ratioed weights approach 0 (1) at s as shown in Fig. 12(b) and Table I. After the elapsed period of 850 s, the absolute weights , and have been decreased to 0. Note that the time for the ratioed weights to become 1 or 0 depends on the leakage current. The larger (smaller) leakage current makes the time shorter (longer).

After the three patterns in Fig. 11 have been learned in the learning period and elapsed in the elapsed period of 850 s in this case, both correct patterns in Fig. 11 and 300 noisy patterns are applied to the 9 9 RMCNN with for recognition and recovery. The noisy test patterns are the learned patterns with noise where the noise level is normally distributed with mean 0 and standard deviation 0.25. It is found that all of input correct or noisy patterns can be recognized and recovered correctly by the RMCNN after the three patterns have learned and elapsed for 850 s. After 2500 s, all the ratioed weights are decayed to 0 as shown in Fig. 12(b). Thus, the RMCNN cannot recognize input patterns. The total recognizable time in this case is 1650 s from 850 s to 2500 s. If only any two patterns in Fig. 11 are learned by the 9 9 RMCNN with , both correct and noisy input patterns can be recognized for 2500 s right after the two patterns are learned.

When the noise standard deviation of the noisy test patterns is increased to 0.3 (0.4), the average probability of accurate recognition is decreased to 97% (60%). Thus, the degradation of recognition rate is increased with the increase of noise stan-dard deviation.

As compared to the 9 9 CNN without RM, but with the same learning rule and constant leakage on the coefficient of

(8)

(a)

(b)

Fig. 12. For the 92 9 RMCNN, a) the simulated absolute weights of the cell C(4,4) stored in the capacitorCzs versus time and b) the corresponding ratioed weights of the cell C(4,4) versus time.

space-variant templates, only the two patterns “one” and “two” in Fig. 11(a) and 11(b), respectively, can be learned and only correct input patterns can be recognized from 0 to 1200 s. If the pattern “four” in Fig. 11(c) is used, only one pattern can be learned and recognized by the 9 9 CNN without RM.

From the above results, it is found that some ratioed weights are not well separated after the three patterns are learned by the 9 9 RMCNN. Thus, the pattern recognition and recovery is not successful. After 850 s, the feature-enhancement effect makes the ratioed weights well separated and the insignificant weights are decreased to zero. Thus, the pattern recognition and recovery can be performed successfully even with three input noisy pat-terns. If only two patterns are learned, the ratioed weights are well separated right after learning. Thus, no elapsed time is re-quired for pattern recognition and recovery. If the CNN has only absolute weights without RM, there is no feature-enhancement effect under constant leakage and the number of recognizable patterns is reduced to one for complicated pattern and two for simple patterns. Besides, the noisy input patterns cannot be re-covered without the feature-enhancement effect of RM.

The above results on the storage capacity of 9 9 RMCNN are obtained with the patterns in our test set. If different patterns are used, the results might be different.

The HSPICE simulation of the complete CMOS 9 9 RMCNN circuit designed in Section III is performed by using the device parameters of 0.35- m SPQM N-well CMOS tech-nology. The control-timing diagram in the HSPICE simulation is shown in Fig. 13. In the learning period, the signal is set to high level (3 V) and to low level (0 V). Thus, the circuit

Fig. 13. The control-timing diagram in the HSPICE simulation of the 92 9 RMCNN withr = 1.

Fig. 14. The noisy pattern of the Chinese character “four” corresponding to the correct pattern in Fig. 12(c).

architecture of Fig. 3 can be formed. The control signals ni1, ni2, and ni4 in Fig. 13 are sequentially set to a high level so that the three patterns in Fig. 11 with the black (white) signal level of 2.5 V (0.5 V), are input to the circuit for learning. In the elapsed period, and are set to low (0 V). The learned absolute weight is stored on the capacitor and decayed with time under the inevitable leakage current. In the recognition period, and and the circuit architecture of Fig. 4 is formed. Both control signals ni4 and nin in Fig. 13 are set to high level so that the noisy pattern shown in Fig. 14 can be applied to the 9 9 RMCNN for recognition. After enough elapsed time, the column select signals are sequentially set to high level. Thus, the state for to 9 can be read out column by column from to . The HSPICE simulated output waveforms for the noisy pattern are shown in Fig. 15 where the high (low) voltage of 1.2 V (0 V) represents black (white) level. It can be seen from Fig. 15 that the recognized result is the recovered correct pattern in Fig. 11(c). Thus, the above MATLAB simulation results have been verified in those HSPICE simulations on the real circuits.

B. 18 18 RMCNN

The behavior simulation of the 18 18 RMCNN with is also performed. The patterns used for learning and recognition are the patterns of five Chinese characters shown in Fig. 16. One of the learned A templates s and s of the cells C(4,4) and C(10,4) at s are listed in Table I with their cor-responding absolute weights s and s , respec-tively. Due to the leakage current, the smaller absolute weights

(9)

Fig. 15. HSPICE simulation output waveforms of neuron stateXij in the 92 9 RMCNN with the input noisy pattern “four” for recognition.

(a) (b)

(c) (d)

(e)

Fig. 16. Correct patterns of five Chinese characters: (a)“Up”; (b)“Soil”; c)“Work”; (d)“Mountain”; (e)“Farm” which are learned and stored in the 182 18 RMCNN.

are decayed to zero at 1500 s as shown in Fig. 17(a) and Table I. But the feature-enhancement effect [14]–[17] makes the coefficients of template converge to 1, 0.5 or 0 as shown in Fig. 17(b) and Table I. In , two largest terms in are left at 1500 s. Thus, the corresponding coefficients of s are converged to 0.5 instead of 1 according to (16).

After the five patterns in Fig. 16 have been learned and elapsed for the elapsed period of 1500 s, both correct patterns and 500 noisy patterns with noise levels normally distributed with mean 0 and standard deviation 0.25, are applied to the 18 18 RMCNN for recognition and recovery. The recogniz-able time for the five correct patterns is from 1500 s after these patterns have been learned to 2500 s. If only any four patterns in Fig. 16 are learned, the recognizable time for correct patterns is from 1250 s to 2500 s. For the recognition of noisy patterns, 98% accuracy on recognition and recovery can be achieved for five patterns whereas 99% accuracy for four patterns. In the case of five learned patterns, when the noise standard deviation is 0.3, the average probability of accurate recognition is down to 85%. The average probability of accurate recognition is only 50% when the noise standard deviation is increased to 0.35.

(a)

(b)

Fig. 17. For the 182 18 RMCNN, (a) the absolute weights ZS (t) and (b) the ratioed weightsA (t) of cell C(4,4) versus time.

From the simulated recognition rates of 9 9 and 18 18 RMCNNs, it is realized that the 18 18 RMCNN can learn more patterns, but the tolerance to the noise standard deviation is lower as compared to that of 9 9 RMCNN.

In the 18 18 CNN without RM, but with the same learning rule and constant leakage on the coefficients of space-variant templates, only two patterns can be learned and only the correct patterns can be recognized at s. But after elapsed time s, the correct patterns cannot be recognized. Thus, 18 18 CNN without RM has less capability in pattern learning, storing and recognizing. This is quite different from the case of RMCNNs where increasing the size from 9 9 to 18 18, the number of noisy patterns for learning and recognition is increased from three to five. The main reason for this difference is that as the number of stored patterns is increased with the array size of CNNs, the total number of space-variant templates is increased. The CNN associative memories without RM cannot keep all these templates well separated. Thus exact pattern recognition and recovery cannot be realized. But the feature-enhance effect of RMCNN retains the simple feature of the space-variant template coefficients and keeps them well separated. Thus, more patterns can be learned, stored and recognized.

Due to the unique feature-enhancement effect, the RMCNN can learn, store, recognize, and recover the same number of black and white (B/W) patterns with less weight connections among neurons as compared with the Hopfield neural network with RM and constant leakage on template coefficients [17]. For example, 18 18 RMCNN can process five B/W patterns as

(10)

Fig. 18. Photograph of the fabricated CMOS 92 9 RMCNN chip.

9 9 Hopfield neural network with RM. But 18 18 RMCNN has 1296 weight connections while the 9 9 Hopfield network with RM has 6480 weight connections. The circuit complexity of RMCNN is about one fifth of the Hopfield network with RM. As compared to other CNN associative memories without RM and without leakage on the stored template coefficients during the recognition operation [4], [5], [7], [8], the maximum numbers of stored and recognizable patterns are 25 (12) for 9 9 CNN with 49 (25) synaptic connections per cell [4], [7], [8] and two for 6 6 CNN with three synaptic connections per cell [5]. It is found from this work that both the leakage on the stored template coefficients and the noise of input patterns has a strong effect on the maximum number of stored and recogniz-able patterns.

V. EXPERIMENTALRESULTS

To verify the function of RMCNN, the experimental chip of 9 9 CMOS RMCNN circuit using the proposed architec-ture is designed and fabricated by using 0.35 m single-poly quadruple-metal (SPQM) N-well CMOS technology. The pho-tograph of the fabricated chip is shown in Fig. 18. It includes 9 9 regular cells, one surrounding layer of boundary cells, 144 RMs, and 9 rows of readout circuits. To compensate for the inevitable process variation effects on circuit parameter and guarantee the correct operation of the RMCNN chip, the gain is set to 4 as realized by the current ratio of the current mir-rors M19/M25 and M19/M26 in Fig. 7.

Firstly, the three correct patterns in Fig. 11 are learned and the learned absolute weights are stored on the 2-pF capacitor of the fabricated 9 9 RMCNN chip. As expected, the fabricated 9 9 RMCNN chip cannot recognize the correct test patterns just after it has learned the three patterns. After about 10 min, three noisy patterns, are input to the 9 9 RMCNN chip for recognition. The measured output waveforms of the cell state for the noisy pattern “four” are shown in Fig. 19 where the minimum readout time of a cell state signal is 1 s. In Fig. 19, the first two signal waveforms are the column select signals CS1 and CS9, which select the responding first and ninth column to readout circuits. Other signal waveforms are the measured cell state outputs of each row. The high (low) signal level of

Fig. 19. Measured output waveforms for the input noisy pattern “four” in the fabricated CMOS 92 9 RMCNN chip.

TABLE II

SUMMARY ON THECHARACTERISTICS OF THEFABRICATEDCMOS 92 9 RMCNN CHIP

1.2 V (0 V) represents black (white) level. As may be realized from the waveforms of Fig. 19, the noisy pattern “four” has been recovered to the correct pattern shown in Fig. 11(c). Similarly, the noisy patterns corresponding to Figs. 11(a) and 11(b) can be recognized and recovered correctly. If the input is the correct pattern, it still can be recognized.

The characteristics of the fabricated CMOS 9 9 RMCNN chip are summarized in Table II. The chip area of one single pixel including one regular neuron cell and two RMs is m m. The chip area of a single RM block including the capacitors and is m m. The chip area of 2-pF capacitor is m m. The quiescent power dissipation is 120 mW whereas the dynamic power dissipation is 120 mW 140 mW. The total readout time of the CMOS 9 9 RMCNN is 9 s.

(11)

on the absolute template coefficients, the RMCNN can learn and recognize the same number of patterns with less weight connections as compared to the Hopfield neural network with the same RM and constant leakage on the template coefficients. Moreover, the proposed RMCNN can learn and recognize more patterns as compared to the CNN associative memories without RM, but with the same learning rule and the same constant leakage on the coefficients of space-variant templates. Based upon the designed architecture and CMOS circuits of the RMCNN, an experimental chip of CMOS 9 9 RMCNN has been designed and fabricated by using 0.35 m CMOS technology. The experimental results has successfully verified the correct function of 9 9 RMCNN. Since the proposed RMCNN has the advantageous features in learning, storing, and recognizing image patterns, it is suitable for many applications of neural associative memory in real-time image processing.

ACKNOWLEDGMENT

The authors would like to thank the Chip Implementation Center (CIC), National Science Council (NSC), Taiwan, R.O.C., for their support in chip fabrication, and the reviewers for their valuable suggestions.

REFERENCES

[1] L. O. Chua and L. Yang, “Cellular neural networks: Theory,” IEEE

Trans. Circuits Syst., vol. 35, pp. 1257–1272, Oct. 1988.

[2] , “Cellular neural networks: Applications,” IEEE Trans. Circuits

Syst., vol. 35, pp. 1273–1290, Oct. 1988.

[3] T. Roska, “Analog events and a dual computing structure using analog and digital circuits and operators,” in Discrete Event Systems: Models

and Applications, P. Varaiya and A. B. Kurzhanski, Eds. New York: Springer-Verlag, 1988, pp. 225–238.

[4] D. Liu and A. N. Michel, “Cellular neural networks for associative mem-ories,” IEEE Trans. Circuits Syst. II, vol. 40, pp. 119–121, Feb. 1993. [5] M. Brucoli, L. Carnimeo, and G. Grassi, “An approach to the design

of space-varying cellular neural networks for associative memories,” in Proc. 37th Midwest Symp. Circuits and Systems, vol. 1, 1994, pp. 549–552.

[6] A. Lukianiuk, “Capacity of cellular neural networks as associative mem-ories,” in Proc. 4th IEEE Int. Workshop Cellular Neural Networks and

Their Applications, 1996, pp. 37–40.

[7] H. Kawabata, M. Nanba, and Z. Zhang, “On the associative memories in cellular neural networks,” in Proc. IEEE Int. Conf. Systems, Man and

Cybernetics, vol. 1, 1997, pp. 929–933.

[8] P. Szolgay, I. Szatmari, and K. Laszlo, “A fast fiexed point learning method to implement associative memory on CNNs,” IEEE Trans.

Cir-cuits Syst. I, vol. 44, pp. 362–366, Apr. 1997.

[9] R. Perfetti and G. Costantini, “Multiplierless digital learning algorithm for cellular neural networks,” IEEE Trans. Circuits Syst. I, vol. 48, pp. 630–635, May 2001.

[10] A. Paasio, K. Halonen, and V. Porra, “CMOS implementation of asso-ciative memory using cellular neural network having adjustable tem-plate coefficients,” in Proc. IEEE Int. Symp. Circuits and Systems, vol. 6, 1994, pp. 487–490.

[17] C.-Y. Wu and C.-H. Cheng, “The design of CMOS modified Hopfield neural network for pattern recognition,” in Proc. Int. Symp. Multimedia

Information, 1997, pp. 585–590.

[18] J. A. Freeman and D. M. Skapura, Neural Networks—Algorithms,

Applications and Programming Techniques. Reading, MA: Ad-dison-Wesley, 1992.

[19] C.-Y Wu and C.-H Cheng, “The design of cellular neural network with ratio memory for pattern learning and recognition,” in Proc. IEEE 6th

Int. Workshop Cellular Neural Network and their Applications, 2000,

pp. 301–307.

[20] , “A new analog multiplier-divider with compact structure for CMOS neural network applications,” in Proc. 1st Asia Pacific Conf. on

ASICs, 1999, pp. 315–317.

[21] C.-Y Wu, C.-C Hsieh, F.-W Jih, T.-P Sun, and S.-J Yang, “A new share-buffered direct-injection readout structure for infrared detector,” in Proc.

SPIE Infrared Technology XIX, vol. 2020, 1993, pp. 57–64.

Chung-Yu Wu (S’76–M’76–SM’96–F’98) was born

in 1950. He received the M.S. and Ph.D. degrees from the Department of Electronics Engineering, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C., in 1976, and 1980, respectively.

Since 1980, he has served as a Consultant to high-tech industry and research organizations and has built up strong research collaborations with high-tech industries. From 1980 to 1983, he was an Associate Professor at National Chiao-Tung University. During 1984 to 1986, he was a Visiting Associate Professor in the Department of Electrical Engineering, Portland State University, Portland, OR. Since 1987, he has been a Professor at National Chiao-Tung University. From 1991 to 1995, he was rotated to serve as the Director of the Division of Engineering and Applied Science on the National Science Council, Taiwan, R.O.C. Currently, he is the Centennial Honorary Chair Professor at National Chiao-Tung University. He has more than 250 published technical papers in international journals and conferences. He also has 19 patents including nine U.S. patents. His research interests focus on nanoelectronics, low-voltage low-power mixed-mode circuits and systems for giga-scale systems applications, cellular nonlinear networks and neural sensors, RF communication circuits and systems, biochips, and bioelectronics. Dr. Wu is a member of Eta Kappa Nu and Phi Tau Phi Honorary Scholastic Societies. He was a recipient of the IEEE Third Millennium Medal, the Outstanding Academic Award by the Ministry of Education in 1999, the Outstanding Research Award by the National Science Council in 1989–1990, 1995–1996 and 1997–1998, the Outstanding Engineering Professor by the Chinese Engineer Association in 1996 and the Tung-Yuan Science and Technology Award in 1997.

Chiu-Hung Cheng (S’97) was born in Taipei,

Taiwan, R.O.C., in 1971. He received the B.S. and M.S. degrees from the Department of Electronics Engineering, National Chiao-Tung University, Hsinchu, Taiwan, R.O.C., in 1995 and 1997, respectively, where he is currently working toward the Ph.D. degree in the Department of Electronics Engineering.

His research interests include analog integrated circuits design, and CNN integrated circuits design.