Research Motivation and Thesis Organization

CHAPTER 1.............................................................................................................................. 1

1.3 Research Motivation and Thesis Organization

After learning period, the weight a_ijkl(0) in Eq(1.5) are not used directly. Instead , we use the aijkl(T) after elapsed period[18]-[19]. The weight aijkl(T) can be written as

The c(T) is the amount of the absolute-weight decaying. After the elapsed process, all absolute-weights decay. Some of the absolute weights even decays to zero. But not all of the ratio weights a_ijkl decay, some of the ratio weights are enhanced and the others decay. After this elapsed period, the important ratio weights become larger and the trivial weights are

smaller. Table 1.1 shows some template A of absolute-weights and ratio weights.[18]-[19]

Before elapsed period, the template A of ratio weights A₄₄(0s) are the learning result according to Eq.(1.5), and the ss44 is the numerator of Eq.(1.5). A44(0s) and ss44(0s) both don’t have zero elements. It’s obvious, after elapsed period, some of elements in ss₄₄(850s) decay to zero.

Computing the corresponding ratio weights with Eq.(1.5), then we’ll get the A44(850). In A44(850), the important ratio weight ¹

2 increases to 1, and the others decrease to 0. So the template A becomes a feature enhanced template. With this characteristic, the recognition rate is improved.

The original design, RMCNN with elapsed operation, needs a elapsed period to get the feature enhance ratio weights Aij. But the length of elapsed period must be controlled well. If the length of elapsed period is too long, all of the ratio weights decay to zero and the circuit doesn’t have any recognition function. If the length of elapsed period is too short, we can’t get a good feature enhanced ratio weights. Some weights that should decay to zero don’t decay to zero completely.

When those learning patterns change, the best length of elapsed period changes too. Then it’s necessary to tune the best length of elapsed period with software when we want to let the circuit learn different patterns. This step let the operation of this circuit not automatic enough.

We develop a new RMCNN w/o EO. This new structure generates the feature enhanced ratio weights directly after learning period. When the learning patterns change, we needn’t adjust the elapsed time. The new structure can recognize noisy pattern directly after learning period. In this thesis, chapter 2 describes the architecture and the CMOS circuit implementation. Chapter 3 is about the simulation result of Hspice and Matlab. The experimental result and some layout description are in chapter 4. Finally, chapter 5 is the conclusion and future work.

Table 1.1 Template A of ratio weights and the corresponding absolute-weights

RMCNN Ratio weights Corresponding absolute-weights

9x9

CHAPTER 2 ARCHITECTURE AND CIRCUIT IMPLEMENTATION

2.1 Operational Principle and Architecture

It is known that the ratio memory (RM) can suppress the unimportant weight and enhance the significant weight to get the feature enhance characteristics.[18]-[19] Since the absolute weights are decreased with the leakage current, significant ratio weights increase whereas the unimportant ratio weights decrease. For example, two of the four weights in template A increase and the others decrease. Finally the two increasing weights increase up to 1/2. Similarly, these significant three (four) weights increase to 1/3 (1/4).

After leakage current decay the absolute weight, some ratio weights increase and some decrease. The equation used to distinguish which ratio weights increase and which ratio weights decrease can be written as[20]

( ) the ratio weights decrease. So the increasing and decreasing ratio weights are detected. After the comparing operation, the increasing weights are set an appropriate value (1,1/2,1/3 or 1/4) and the decreasing weights are set zero directly This equation is used to determine the final ratio weights directly rather than elapsed operation. The new Hebbian learning algorithm can be written as blow:

Step 1 : find the absolute weights template A Sij(p) after p patterns are learned

Step 2 : find the absolute mean of the absolute weights in a template

( )

ss ijkl

M =mean

∑

ss Step 3 : generate the ratio weights

( , )

Where and are the input of cell(i,j) and cell(k,l) respectively. The is number of preserved weights in N

p 1

uij⁺ u_kl^p⁺¹

( , ) Nr i j

PN _r(i,j) and r=1.

A 9x9 array size RMCNN is implemented in this thesis. Fig. 2.1 shows the block diagram of the RMCNN w/o EO and the controlling relationship between every block. The 9x9 shift register is used to store learning patterns. The learning patterns is generated by pattern generator and is inputted into the shift registers in series. When a learning pattern is stored in register completely, the pattern is inputted into RMCNN w/o EO in parallel for pattern learning. After every pattern is learned, the RMCNN w/o EO enter recognition period.

The recognized result is sent to the output stage that is controlled by two decoders and the output stage output the state of each cell in series. The decoder Decoder_C selects the columns, and Decoder_R selects the rows of the 9x9 array of output stage.

The general architecture of RMCNN is shown as Fig. 2.2. Fig. 2.2 shows the connections between cells and RMs. Each cell connects with four RMs (the up, left, right, down side). And

every RM supports the ratio weight between two pixels. With the power supply 3V in the circuit, 1.5V is defined as zero whereas 2.1V(0.9V) as +1(-1).

Fig. 2.1 The block diagram of RMCNN

Fig. 2.2 The general architecture of RMCNN

The detailed block diagram of cell and RM is shown in Fig. 2.3. In Fig. 2.3, cell(i,j) is

the ith row and jth column cell and is the cell(i,j) input voltage of pth pattern. The block T1、T3 in the cell (i,j) is a V-I converter to change voltage to current. . T2D contains a

detector to detect the sign of state X

uij

ij. T2D block is also a V-I converter, and its output is

absolute current. The sign of T2D input voltage is detected and stored separately. The block W uses current mirror to multiply the cell outputs by 1, 1/2, 1/3, or 1/4. One of the four

weights will be chosen by Counter_L according to how many weight are preserved. The capacitor Cw stores absolute weight in learning period, and the V-I converter T3 transfer the voltage on Cw to absolute current to the COMP block. COMP is a simple comparator.

COMP block compares the mean of the four absolute memory currents with the absolute

memory current, and deciding if the ratio weight should be kept. The Counter_L controls block W to weight the output of each cell. The block T3, capacitor Cw, and several switch form RM. Other blocks form CNN cell.

Fig. 2.3 The detail architecture of RMCNN

The Operation of this circuit is divided to two parts: learning period and recognition

period. In learning period, clk1 is high and clk2 is low. So the architecture in learning period is shown in Fig. 2.4. In learning period, the cell(i,j) input voltage of pth pattern is transferred to current Iu

.Where GmT1 is the transconductance of V-I converter T1. The voltage level 1.5V is defined as zero, so the current flow to opposite direction when is larger or smaller than 1.5V. When is larger than 2.1V or smaller than 0.9V, output current Iu

uij p

uij _ij of T1 becomes

saturated and keeps at the current Iusat. Iusat is about 5.5uA.

Fig. 2.4 Architecture of RMCNN in learning period

The current Iuij flows to the node xij and is converted into a voltage Vxij through the resistor R_ij and capacitor C_ij. T2D outputs an absolute current Iy_ijand a sign(Iy_ij) according to

the value of Vxij. Since the structure of T2D is similar to T1 and T2D has a absolute-value circuit, the output current Iy_ij and the sign(Iy_ij) can be written as

.Where GmT2D is the transconductance of T2D and the current Iysat is the saturated output current of T2D. It is about 5.5uA too. Note that Iy_ij always flows to the same direction whether Vxij is larger or smaller than 1.5V. The sign of Vxij is detected by a detector in T2D and sent to the block W. Current Iy_ij flows into the block W. According to the signs of input voltage Vxij and Vxkl, the output current of W charges or discharges the capacitor Cw. The block W is set to a default state in learning period. The default state is multiplying Iy_ijby 1/4.

The choice of this default state is just for circuit design convenience and we can control the length of learning time to charge or discharge the capacitor Cw. The capacitor Cw is a MOS capacitor and the capacitance value is 2p F. The capacitance value of Cw and Iysat is as large as RMCNN with elapsed operation [18]. To consider the leakage current effect, a constant leakage current of 0.8 fA is applied to the capacitor Cw of 2 pF so the voltage Vwaijkl is decreased. The 2 pF capacitor Cw is implemented on the chip. The value of 2 pF is chosen as a compromise between weight storage time and capacitor chip area. The capacitance value of Cw can’t be chosen too small because of the leakage current consideration. Thus 2 pF is chosen. The current Iysat is chosen as the smallest current that can let the V-I converter operates regularly. The current Iysat must be small because the voltage Vwa_ijkl stored on Cw must be charged or discharged slowly and then the value of Vwaijkl can be controlled slightly.

Thus the Iysat is chosen as 5.5uA and the learning time of a pattern is 100ns.

This charging or discharging Cw process is the learning behavior and generates the absolute weight at the capacitor Cw. When the inputs of neighboring cell(i,j) and cell(k,l) are white or black in a learning pattern. The capacitor Cw between these two cells is charged.

Otherwise, when the inputs of these two cells are opposite color, the capacitor Cw is discharged. The voltage Vwa_ijkl stored on Cw can be written as

( ) 1

when sign of Vx and Vx are the same when sign of Vx and Vx aren t the same

Eq.(2.5)

ijkl( )

Vwa p means the voltage level after the pth pattern is learned. The output current of block W is ¹

4Iysat, and there are two W blocks charge or discharge a Cw at the same time.

Thus after each pattern learning, the voltage changing is ¹ 2

learning time of each pattern is 100ns.

After every pattern is input to circuit, capacitor Cw stores the absolute voltage weight . Then T3 converts the voltage to current and sends this current to the current mode comparator COMP. The COMP compares two current: I

Vwaijkl Vwa_ijkl

oj and Iom . Ioj is the current transferred from T3; I_om is the mean of all absolute weight current in one template A. If I_oj is larger than Iom , COMP gives the Counter_L a “logic high” that means the ratio weight between the two pixels should be preserved.

The connection between COMP and Counter_L is shown as Fig. 2.5. Since each cell just connects with the four nearest cells, there are four COMPs in one cell. Every COMP gives a logic output to Counter_L. At the end of learning period, Counter_L counts how many “logic high” are given from the four COMPs. If there is (are) only one (two) “logic high”, only one (two) ratio should be preserved. Then Counter_L controls the W to weight the output current of T2D as 1×Iy_ij (¹

2×Iy^ij). Similarly, according to the output situation of

COMPs in one cell, the Counter_L may control the block W to weight output current of

T2D as ¹

3×Iy^ij or ¹

4×Iy^ij. The logic output of COMP in cell(i,j) (cell(k,l)) also controls the switch sw2(sw1) in Fig. 2.2. For example if the logic output of COMP in cell(i,j) is low (that means the ratio weight should be zero.), the switch sw2 turns off. Then the information from cell(k,l) in recognition period is isolated. That behavior is equivalent to setting a ratio weight in a template A as zero.

Fig. 2.5 The connection relationships of COMP, Counter_L and RM

At the ending of learning period, every Counter_L counts how many “logic high” are sent from COMPs and controls the W appropriately.

After learning period, the operating process enters recognition period. In this period, the input pattern is noisy pattern. The architecture in recognition period is shown as Fig.2.5. In Fig. 2.6, clk1 is low and clk2 is high. The states of switches sw1 and sw2 are controlled by

COMP. In this period, and are the input voltage of noisy pattern. are

inputted to T1 and transferred to current

noi

uij u_kl^noi u_ij^noi

noi

Iuij . Iu_ij^noi and the output current Iw from other _kl

neighboring cell C(k,l) ( ) flow to the node

Fig. 2.6 Architecture of RMCNN in recognition period

Where is the template A ratio weight. It is generated by W. The Eq.(2.6) implement the RMCNN mathematical equation Eq.(1.1). Because of the settings of template B and threshold are Eq.(1.4) and Eq.(1.5). Thus in Eq.(2.6) there isn’t the threshold and the coefficient of input

The circuit of T1 and Rij is shown as Fig. 2.7. In all of the circuit implementation figure,

the MOS size is written next to the MOS number. The unit of MOS size is micro meter. In Fig.

2.7, the left side is a differential pair structure, and right side is MOS resistor. The voltage Vb1 and Vref are constant bias voltage. Vb1 is 2.5V and Vref is 1.5V. MOS M5 and M6 perform as large resistances to let the linear operating range larger. When the input voltage Vin is larger than Vref, the output current Io flows from left to right. Then the voltage Vxij

rises. Similarly, when the Vin is smaller than Vref, the voltage Vx_ij falls.

Fig. 2.7 The V-I converter T1

Fig. 2.8 is the circuit of T2D. T2D is similar to T1, but it has a detector and an absolute output current structure. The circuit of detector is shown as Fig. 2.9. The detector is just an inverter chain. It is used to detect the sign of T2D input, and the function of detector is described as Eq.(2.6). In Fig. 2.8, left side is also differential pair structure, and right side is the absolute output current structure. The constant bias voltage Vb2 is 1.5V, and the constant bias voltage Vb1 and Vref are the same with T1. When the input voltage Vin is larger than Vref, the current Io flows from left to right. Then the MOS M10 in Fig. 2.8 turns off, and MOS M11 turns on. The current are mirrored by current mirror M12 and M13, and flow

through the M8. Then the MOS M94 mirrors the current of M8 and output the current Ioabs.

Similarly, if the voltage Vin is smaller than Vref, M10 turns on and M11 turns off. The output current Ioabs is mirrored by the current mirror M8 and M94 directly. Whether the input voltage Vin is larger than Vref or not, the flowing direction of Ioabs is always the same. So the circuit has an absolute output current. The usage of the MOS M26 will be explained in section 4.3.

Fig. 2.8 The V-I converter in T2D

Fig. 2.9 The detector in T2D

The circuit of block W is Fig. 2.10. Actually, the block W is combined with T2D. In order to show the MOS size of these two circuits, the diagrams are drawn respectively. Note that the MOS M94 in Fig. 2.10 and the M94 in Fig. 2.8 are the same MOS. The complete circuit diagram of T2D and W is shown as Fig. 2.11. The function of W is to weight the output of T2D. We use current mirror to weight the output of T2D. In Fig. 2.10, because M94, M91, M92 and M93 are current mirror, we don’t use minimum length to avoid strong channel modulation effect. In Fig. 2.11, the drain current of M94 is ¹

4×Ioabs, but the size of M94 isn’t really ¹

4 time of M8. Because even we use 1 micro meter channel length, the drain and source voltage drops Vds of M8 and M94 still influence the current accuracy. Thus the channel width of M94 is adjusted to modify the current accuracy. Similarly, the sizes of M92 and M93 are adjusted too. A better method to let the current mirror operate accurately is using MOS parallel connection. A small MOS is chosen as a unity MOS first. Then the M8 in T2D uses twelve unity MOSs that has parallel connection with each other and M94 uses three unity MOSs has parallel connection with each other. Similarly M91 uses twelve unity MOSs and M92 uses six unity MOSs and M93 uses four unity MOSs. This modified structure will has more accurate mirrored current.

The switches Sw_a, Sw_b, Sw_c, Sw_d, Sw_e and Sw_f are controlled by Counter_L.

According to output of counter, only one path of these switches turns on at the same time. The XOR gate in Fig. 2.10 is used to control the flowing direction of output current. In learning period, the Vin_T1(k,l) is inputted to the XOR gate and the Vin_T3ijkl is inputted to the XOR gate in recognition period.

Fig. 2.10 The CMOS circuit of W

Fig. 2.11 The overview of T2D and W

The V-I converter T3 is similar to T2D. The circuit is shown in Fig. 2.12. In Fig. 2.3, T3

has four outputs. Two of the four outputs are sent to COMP, and the others are sent for summation. Thus the circuit in Fig. 2.12 has four outputs, and the MOS sizes of the current mirrors (M9s1, M9s2, M9s3, M9s4 and M8) are the same.

Fig. 2.12 The CMOS circuit of T3

2.2.2 Comparator

Fig. 2.13 shows the circuit of Comparator. In order to save the area of whole chip, we use a simple current mode comparator. In Fig. 2.13, if the input current I_Mss is larger than I_aw, the logic output Vout is low. Otherwise, Vout is high. The port IMss is used to receive the mean of summed currents, and the port I_aw receives the absolute-weight current that is transferred from T3. In the above algorithm, if the absolute-weight current equals to the mean of the summed current, the ratio weights should be preserved too. That means the logic output of comparator should be high if IMss equals to Iaw. Because the usages of IMss and Iaw are specified, the sizes of Mc3 and Mc4 are designed as little smaller than Mc1 and Mc2. The difference of the MOS size makes the logic output is high even if IMss equals to Iaw.

Fig. 2.13 The CMOS circuit of comparator (COMP)

In section 2.1, it is described that we need to count the mean of four absolute-weight current. That means it is necessary to divide a summed current by four. But there isn’t any divider in this circuit, the dividing behavior is implemented by the wire connection of COMP.

The detail is shown as Fig. 2.14. In Fig. 2.14, two of the T3 output ports are drawn, and the others are abridged. The four output currents of T3_i(j+1), T3_i(j-1), T3_(i+1)j and T3_(i-1)j are summed at the node N and form the current Isum. Because the connection of MOS Mc1 and Mc2 in Fig.

2.13 are diode connection, they are all in saturation region. The input impedance of I_in1 port is very large and isn’t sensitive to the drain and source voltage drop Vds and the flowing current.

In Fig. 2.14, node N is connected to the input of all four comparators. Because of the similar input impedance of the four comparators, the current Isum flows into the four comparators averagely. Thus the currents flow into Mc11, Mc12, Mc13 and Mc14 are ¹

4I^sum and the current ¹

4I^sum is the mean of summed current.

Process variation is considered in the RMCNN w/o EO. If the capacitor Cw and all of the V-I converter have process variation, the COMP can’t get the accurate current. However,

在文檔中免衰減操作無自迴授比例式記憶細胞 (頁 16-0)