Output stage and input pattern interface - Operational Principle and Architecture

CHAPTER 2.............................................................................................................................. 7

2.1 Operational Principle and Architecture

2.2.4 Output stage and input pattern interface

The output stage is shown as Fig. 2.20. The nodes x₁₁~x₉₉ are the node x_ij in Fig. 2.4.

M11~M99 perform as level shifter to drive parasitical capacitance of the switches and metal line. The unit gain buffer is a negative feedback OP and it is used to drive the output pad. The circuit of unit gain buffer is shown as Fig. 2.22. Two 4-bit decoders are used to control those switches Sw_c11~Sw_c99 and Sw_r1~Sw_r9. One decoder controls column switches Sw_c11~Sw_c19 (Swc21~Swc29, Swc31~Swc39, …etc.), and the other controls switches Swr1~Swr9. This structure is used to read out every pixel one by one.

There are some current source can be shared in the output stage shown in Fig. 2.20. The modified output stage is shown as Fig. 2.21. In Fig. 2.21 every MOS in the same row uses one current source. This modified output stage saves much power consumption.

Fig. 2.20 The output stage

Fig. 2.21 The modified output stage

In order to input any arbitrary learning patterns, the shift registers input interface is used.

Fig. 2.23 shows the input interface. DFF_N is negative edge trigger D type flip-flop. In the beginning of learning period, clk1 and newp turn on and ptni inputs the learning pattern pixel by pixel. After the CLK of DFF_N oscillates nine times (because the cell array has 9 columns), pin turns on to input the learning pattern into each cell. When pin turns on, newp turns off to prevent the pattern changes as a glitch occurs on CLK of DFF_N. After the first pattern is learned, clk1 and newp turn on again and pin turns off. Then shift registers transfer the stored learning pattern and the learning of the second pattern starts.

Fig. 2.24 is one part of Fig. 2.23 and it shows how to mix the noise with learning pattern in recognition period. The capacitance C_gp is the gate capacitance of M1 in Fig. 2.7 and other parasitical capacitance. In learning period, the capacitance Cnoi is pre-charge to Vnoi and noi always turns off. When recognition period starts, the innocent pattern is already stored in shift

register and clk1 turns off to isolate D-flip-flop. Then noi turns on and charge sharing occurs between C_gp and C_noi. So the voltage on node Nd is a mid level voltage and the amplitude can be adjusted by changing the capacitance ratio of Cgp and Cnoi.

Fig. 2.22 The unit gain buffer in the output stage

Fig. 2.23 The pattern input interface that formed by shift register

Fig. 2.24 The structure that used to mix noise with innocent pattern

CHAPTER 3 SIMULATION RESULT

3.1 Matlab Simulation Result

The MATLAB software is used to simulate the behavior of the CNN with ratio memory (RMCNN) as an associative memory. In the MATLAB simulation, 9x9 cells are used to form the RMCNN with r = 1. Thus, it can process patterns with 81 pixels. The total three learning pattern is shown as Fig. 3.1. The patterns are Chinese character “one”, “two” and “four”.

Normal distribution and uniform distribution noise are both mixed with the clear pattern respectively, and the Matlab simulation result shows that the three patterns can be recovered.

Fig. 3.2 shows the three patterns mixed with normal distribution noise. Fig. 3.3 shows the three patterns mixed with uniform distribution noise.

Fig. 3.1 The three clear learning patterns

Fig. 3.2 Patterns mixed with normal distribution noise (standard deviation:0.5)

Fig. 3.3 Patterns mixed with uniform distribution noise

The design in this thesis implements a method that generates ratio weights without elapsed operation. Table 3.1 compares the ratio weights generated by elapsed operation and ratio weights generated by this design. In the RMCNN with elapsed operation design, the absolute-weights stored on capacitance are decayed by leakage current. To consider the leakage current effect, a constant leakage current of 0.8 fA is applied to the capacitor Css of 2 pF. In Table 3.1, the elapsed time is 800s. Some small ratio weights generated by elapsed operation don’t decay to zero completely, and some largest ratio weights generated by elapsed operation don’t enhance to one. So the ratio weights aren’t feature enhanced enough. If the elapsed time is longer (for example: 850s), the ratio weights generated by elapsed operation can be feature enhanced completely. But if the elapsed time is too long, the ratio weights disappear (because all of the absolute-weights decay to zero). RMCNN w/o EO doesn’t have this trouble. We needn’t tune the best elapsed time and the circuit can get the best feature enhanced ratio weights.

In Matlab simulation result, not all of the noisy pattern can be recognized. If the intensity of mixed noisy is very strong, RMCNN can’t recognize the noisy pattern too. Two kinds of noise are simulated in this thesis: normal distribution and uniform distribution. If the standard deviation of noise is larger than 0.3, the recognition rate is lower than 90%.

The recognition rate is also simulated. Ninety random noisy patterns (thirty noisy patterns for each Chinese character) are generated by Matlab and recognized. Fig. 3.4 shows the recognition rates of three algorithms. The “CNN without RM” means that the algorithm recognizes noisy patterns directly after learning process. It doesn’t have the feature enhanced ratio weights, and its recognition rate is worst. Chinese character “four” always can’t be recognized. The recognition rates of “RMCNN with elapsed operation” and “RMCNN without elapsed operation” is similar. In Fig. 3.4, the elapsed time of “RMCNN with elapsed operation is 800s. So the recognition rate of “RMCNN without elapsed operation” is lightly better than “RMCNN with elapsed operation”. If the elapsed time is 850s, the two recognition

rates are the same completely because they get the same ratio weights.

Table 3.1 The ratio weights generated by (1) RMCNN with elapsed operation (2) RMCNN w/o EO

Ratio Weights With elapsed operation Without elapsed operation

9x9

Fig. 3.4 Recognition rate of Matlab simulation (1) CNN without RM (2)RMCNN with elapsed operation (3) RMCNN w/o EO

3.2 Hspice Simulation Result

The simulation of T1 and R_ij shown in Fig. 2.7 is shown as Fig. 3.5. When the input voltage is between 0.9V and 2.1V, the transfer curve in Fig. 3.5 is linear. If the input voltage of T1 is smaller than 0.9V or larger than 2.1V, the output voltage is saturated. Thus it is described in chapter 2 that the voltage level 2.1V (0.9V) is defined as +1 (-1). Fig. 3.6 shows the simulation result of T2D. Because the output current of T2D is an absolute current, the flowing direction of the output current is the same when the input voltage of T2D is larger or smaller than 1.5V. The transfer curve of T2D is linear when the input voltage is between 0.9V and 2.1V. The simulation result of COMP is shown as Fig. 3.7. In Fig. 3.7, The input current I_Mss is swept and I_aw is kept as constant. Fig. 3.7 has three rows. The first row is the overall observation of .DC simulation. To observe the dead zone of the COMP, the second row of

Fig. 3.7 is the transfer curve which is zoomed out. In Fig. 3.7, the first and second rows are the transfer curve of Vout in Fig. 2.13, and the third row is the transfer curve of Vout in Fig.

2.13. Fig. 3.7 shows the dead zone of the comparator is about 10nA.

Fig. 3.5 Transferring curve of the V-I converter T1 and Rij

Fig. 3.6 Transferring curve of the V-I converter T2D

Fig. 3.7 .DC Simulation result of comparator

Fig. 3.8 and Fig. 3.9 show the simulation result of the unit gain buffer in Fig. 2.20 and Fig. 2.21. Fig. 3.8 shows the frequency response of the OP in Fig. 2.21 and Fig. 3.9 shows the difference between Vin and Vout of the unit gain buffer in Fig 2.18. Table 3.2 is the specification of the OP in Fig. 2.21.

Fig. 3.8 Frequency response of the OP that performed as unit gain buffer

Fig. 3.9 The voltage difference between Vin and Vout of unit gain buffer

Table 3.2 Specification of the OP performed as unit gain buffer

DC gain 37.2 dB

3dB freq 24K Hz

Unit gain freq 1.8M Hz

Load capacitor 20p

Bias current 800 uA

The Whole chip recognition process is also simulated by Hspice. Because there are 81 pixels, it isn’t feasible to show the learning and recognition process of all pixels. Thus several pixels are shown as examples. All of the pixels are checked and they are all recovered.

Fig. 3.10~Fig. 3.13 show the whole chip learning and recognition process of four pixels.

In Fig. 3.10~Fig. 3.13, circuit learns patterns in “learning period”, and the “pattern transferring” is used to transfer the learning patterns stored in shift register. The timing

“counter” means the counter is counting how many ratio weights are preserved. In “noisy pattern read in”, the noisy pattern that supposed to be recognized is inputted into the circuit.

After the “noisy pattern read in”, the recognition process starts.

It is described in chapter 2 that the pure black voltage level is defined as 2.1V and the pure white voltage level is defined as 0.9V. Fig. 3.10 is the operation process of the second row and the fourth column pixel P(2,4) and Fig. 3.11 is the operation process of P(2,2). P(2,2) is a white pixel with noise, and P(2,4) is a white pixel without noise. When “noisy pattern read in” starts, the voltage level of P(2,4) is between 0.9V and 2.1V. Thus that’s a gray pixel.

When recognition period begins, the voltage level of P(2,4) is pulled blow 0.9V, thus P(2,4) is recognized and recovered. P(2.2) is also pulled blow 0.9V after recognition period. Thus the P(2,2) is recognized too. Fig. 3.12 shows the operation process of P(3,8), and Fig. 3.13 shows the operation process of P(3,2). P(3,8) is a black pixel without noise, and P(3,2) is a black pixel with noise. Similarly, when “noisy pattern read in” starts, voltage level of P(3,2) is between 0.9 and 2.1V. That means P(3,2) is a gray pixel in this timing. After recognition period, this pixel is pulled over 2.1V, and that shows it is recover to a pure black pixel.

Similarly, P(3,8) is pulled over 2.1V too, and it is recognized.

Fig. 3.10 Recognizing process of the white pixel without noise P(2,4) (Hspice)

Fig. 3.11 Recognizing process of the white pixel with noise P(2,2) (Hspice)

Fig. 3.12 Recognizing process of the black pixel without noise P(3,8) (Hspice)

Fig. 3.13 Recognizing process of the black pixel with noise P(3,2) (Hspice)

CHAPTER 4 LAYOUT DESCRIPTIONS AND EXPERIMENTAL RESULTS

4.1 Layout and Experimental Environment Setup

Fig. 4.1 and Fig. 4.2 show the layout of the chip. Fig. 4.1 shows the layout of one cell and two ratio memories. The central part of Fig. 4.1 is cell, and the left side and right side of Fig. 4.1 are ratio memories. The area of one cell and two RM is 400x250 um². Fig. 4.2 shows the whole chip layout. In Fig. 4.2, the TSMC standard pads which include ESD device, pre-driver and post-driver are used. The die area is 4.56x3.49 mm². Fig. 4.3 is the package diagram, and the package is 84 pins LCC84. The die photo is shown as Fig. 4.4. Table 4.1 shows the summary of performance. That performance is compared with RMCNN with elapsed operation[18]. The RMCNN w/o EO is compared with the RMCNN with elapsed operation.

The area per pixel of RMCNN w/o EO is smaller than the RMCNN with elapsed operation, but the whole chip area of RMCNN w/o EO is larger. Because the large TSMC standard pad is adapted in RMCNN w/o EO, the whole chip area is larger even if the area per pixel is smaller.

The environment of measurement is shown as Fig. 4.5. The controlling signals and some input signals are generated by the pattern generator of HP/Agilent 16702A. The clock in the pattern generator is 12.5MHz and the signal rising (falling) time is about 4.5ns. Output waveform is shown on the oscilloscope TDK 3054B. The power supply is 3V.

Fig. 4.1 Layout of one pixel (two RM and one cell)

3.49 mm

4.56 mm

Fig. 4.2 Layout of the whole chip (pad included)

Fig. 4.3 The package diagram

Fig. 4.4 The die photo of 9x9 RMCNN without elapsed period

Table 4.1 the summary of the RMCNN w/o EO compared with RMCNN with elapsed operation

RMCNN with EO RMCNN w/o EO

Technology 0.35 µm 1P4M

Mixed-Signal Process

0.35um 2P4M

Mixed-Signal Process

Resolution 9 x 9 Cells 9x9 Cells

No. of RM blocks 144 RMs 144 RMs

1 Pixels 1 cell + 2 RMs 1 cell + 2 RMs

Single pixel area 350 µm x 350 µm 400 um x 250um

CNN array size (include pads) 3800 µm x 3900 µm 4560 um x 3900 um

Power supply 3 V 3V

Total quiescent power dissipation 120 mW 87mW

Minimum readout time of a pixel 1 µs 100ns

Elapsed operation Require Not require

Fig. 4.5 The environment of measurement

This circuit is controlled by many controlling signals. Fig. 4.6 shows the timing relationship of these controlling signals. The circuit figures in chapter 2 explained how these controlling signals control the circuit. The signals clk1 and clk2 determine the architecture of the circuit. If clk1 is high, the architecture of the circuit is learning architecture which is shown as Fig. 2.4. If clk2 is high, the architecture of the circuit is recognition architecture which is shown as Fig. 2.6. Thus the signals clk1 and clk2 can’t be high at the same time.

Otherwise the circuit can’t operate correctly.

In Fig. 4.6, the learning period is marked in the timing that clk1 is high. Similarly, recognition period is marked in the timing that clk2 is high. Signal R is used to reset the output of some sub-circuits in the circuit. The DFF is used to drive the negative edge trigger D-flip-flop in Fig. 2.22. The signals newp and pin appear in Fig. 2.22. When the newp is low, the connection between shift registers is cut off. Then the data in shift registers won’t be changed by the glitch on signal DFF. When newp is high, the shift registers can transfer the learning patterns. Thus the signal DFF oscillates only when newp is high. Signal pin let the pattern stored in shift register input into cells. After learning period, the ratio weights are generated in the timing “Ratio weight generating”. In this timing, the signals Cou_L and Cou_G which appear in Fig. 2.18 and 2.19 oscillate four times to change the output of Counter_L and Counter_G from “00” to “11” sequentially. Then the paths of Sw_a~Sw_f

and S_en1~S_en6 turn on one by one and the ratio weight will be generated. After the timing

“Ratio weight generating”, the signals noi and pin which appear in Fig. 2.23 become high to input the noisy pattern into cells. Then the circuit starts recognition period to recover the noisy pattern. Table 4.1 shows the function and usage of the all controlling signals.

Fig. 4.6 The control-timing diagram in the measurement of the 9x9 RMCNN with r = 1.

Table 4.1 The function of every controlling signal

Control signal Usage

clk1 High：learning period starts

Low：learning period stops R High：reset the circuit

Low：don’t reset

DFF Drive the shift registers (negative trigger D-flip-flop) used to store the learning patterns.

newp High：the shift register can transfer the learning patterns Low：the shift register can’t transfer the learning patterns pin High：the pattern stored in shift registers input to the cells.

Low：the path between shift registers and cells is cut off Cou_L Drive every local counter in every cell

Cou_G Drive the global counter

clk2 High：recognition period start

Low：recognition period stop

noi High：the pattern in shift registers becomes noisy

Low：isolate the noise and innocent pattern in shift register

4.2 Experimental Result

The output stage is described in chapter 2 and Fig. 2.20. Only one pad is used to output the state of every cell. Thus the 81 pixels are read out sequentially.

Before pattern recognition, the learning function is checked first. That verification of learning function checks if the learning patterns are sent into the shift register exactly and the patterns stored in shift registers input to every cell correctly. The pattern is read out directly after the pattern is inputted into the circuit. Fig. 4.7~Fig. 4.9 is the verified result of learning function. Fig. 4.7 shows the learning pattern ”一” in the shift registers. Fig. 4.8 shows the learning pattern ”二” and Fig. 4.9 shows the learning pattern ”四” in the shift registers. In Fig.

4.7~Fig. 4.9, “Ch 2” is the output data of the chip and “Ch 3” is the LSB of the decoder which controls the switches Swc11~Swc99 in Fig. 2.20. “Ch 1” is a trigger signal, and it is meaningless in this measurement. Each row is read out sequentially. The first row is read out first, and then the second row is following. Each row is marked in Fig. 4.7~Fig. 4.9. The output waveform of “Ch 2” in Fig. 4.7~Fig.4.9 can be cut off and recombined to form a new pattern that is more easily discerned. Fig. 4.10~Fig. 4.12 show these recombined output waveform. Left sides of Fig. 4.10~Fig. 4.12 is the pattern that supposed to be learned, and right side is the recombined output waveform. In Fig. 4.7~Fig. 4.12, the output of black pixel is about 1.5V, and the output of white pixel is about 0.2V.

It is obvious that all of the learning patterns are inputted exactly into the circuit, and the shift register indeed work well. But the measurement of recognition function isn’t so successful. Fig. 4.13 is the recognition result of pattern “四” without noise, and Fig. 4.14 shows the recombined output waveform of Fig. 4.13. It is obvious that some pixels in row 4 and row5 are not pulled up enough. That means these pixels are not recover to pure black of pure white color. The colors of these pixels are just gray. Though the recognition of innocent pattern “四” isn’t successful, however the recognition result of patterns “一” and “二” without noise are very successful. Fig. 4.15 and Fig. 4.16 are the measurement result of recognition of

patterns “一” and “二”.

Fig. 4.7 Experimental verification of learning function (“一”)

Fig. 4.8 Experimental verification of learning function (“二”)

Fig. 4.9 Experimental verification of learning function (“四”)

Fig. 4.10 The recombined waveform of the verification of learning function (“一”)

Fig. 4.11 The recombined waveform of the verification of learning function (“二”)

Fig. 4.12 The recombined waveform of the verification of learning function (“四”)

Fig. 4.13 Experimental recognizing result of the clear pattern “四”

Fig. 4.14 The recombined waveform of the experimental recognizing result of the clear pattern “四”

Fig. 4.15 Experimental recognizing result of the clear pattern “一”

Fig. 4.16 Experimental recognizing result of the clear pattern “二”

The recognition result of noisy pattern with noise level 0.5 is shown as Fig 4.17 and Fig 4.18. Fig. 4.17 is the recognition result of pattern “一”, and Fig. 4.18 is the recognition result of pattern “二”. Both the two noisy pattern is unrecognized. The noisy pattern with noise level 0.5 is unrecognized in simulation result too.

Fig. 4.17 Experimental recognizing result of the noisy pattern “一” with noise level 0.5

Fig. 4.18 Experimental recognizing result of the noisy pattern “二” with noise level 0.5

4.3 Cause of the Imperfect Experimental Result

The cause of the unsuccessful recognition is found in this thesis. Table 4.2 shows the absolute-weight of cell(4,4) which is recognized unsuccessfully. Three simulation conditions are in Table 4.2. The absolute-weight ss₄₄^M is simulated by Matlab, and that is a ideal weight.

The absolute-weight is simulated by Hspice in typical-typical corner condition. The absolute-weight is simulated by Hspice in fast-slow corner condition. The

absolute-weights and are strange. The absolute-weights in practical circuit is stored on the capacitor Cw in Fig. 2.4. The Hspice simulation result shows the charging and discharging currents are unbalanced. It is described in chapter 2 that the ratio weights are generated according to the absolute mean of absolute-weight. Table 4.3 shows the generated ratio weights according to the absolute-weights in Table 4.2. Because of the wrong

在文檔中免衰減操作無自迴授比例式記憶細胞 (頁 38-0)