• 沒有找到結果。

Chapter 4 Column-Based Low Power Design Techniques

4.1 Ripple Bit-Line Scheme for Read/Write Operation

In conventional hierarchical bit-line scheme, the data are propagated between local bit-line (LBL) and global bit-line (GBL). Nevertheless, GBLs and LBLs cannot use the same metal layer since the area overhead is limited in ultra-high-density cells design, thus GBLs need additional metal layer [4.1]. Furthermore, the long GBLs go through the entire array that influence the performance and increase the power consumption of read/write operation. The proposed ripple bit-line (RBL) scheme transfers data to divided LBL step by step without requirement of GBL, and provides better sensing margin for read operation by isolated short LBL.

4.1.1 Circuit Implementation & Operation.

The figuration of ripple bit-line read scheme is in Fig. 4.2. Each local read scheme is composed of propagation inverter gated by bank select signal between two local bit-lines and one PMOS transistor (M5) as keeper controlled by column enable (Col_en) signal. Both bank select and column enable are correlative signals of pre-decoder. In stand-by and write mode, all of related signals are set to high, and bank select signal (Bank_sel) will cut off both the power and ground source to disable the propagation inverter. Otherwise, the data of LBL will disturb the stored value in TCAM cells. Thus, the separated capacitance and better noise immunity of LBL improve the noise margin of storage cell.

For read mode, read cycle is divided into pre-charge (CLK_0) and evaluation (CLK_1) phases. During pre-charge phase, each LBL is charged to high voltage by write scheme including pre-charge circuit, and M5 is turned on by Col_en to facilitate pre-charge operation. Besides, voltage of each LBL is equalized by active propagation

inverter to ensure they have same voltage level. In contrast, pre-charge circuit is turned off by bit-line pre-charge (BL_pre) signal during evaluation phase. At the same time, the selected bank is controlled by the corresponding signal to begin the read evaluation. For example, if the read data is located in Bank2, Col_en goes high to terminate charging path of LBL. Also Bank_sel2 is pulled up to disconnect from previous bank by disabling the propagation inverter between Bank3 and Bank2.

Accordingly, the stored data will be transmitted from LBL2 to LBL0 sequentially, and the output data is converted by multiplexer finally. The timing diagram of read operation is also shown in Fig. 4.2 and related control signal of ripple bit-line scheme for different operation is demonstrated in Table 4.1.

LBL0~3

Table 4.1 Key signals of replica-column scheme.

Control Signal

WRITE READ

HOLD

CLK_0 CLK_1 CLK_0 CLK_1

BL_pre High High Low High High

Col_en High High Low Low(unselected)

High(selected) High

Bank_sel High High Low Low(unselected)

High(selected) High

However, reading “0” is the worst case in read operation since LBLs have been charged to high voltage in pre-charge phase and it is much harder to transmit data “0’

than data “1” to after LBL. Using smaller threshold voltage of pull-down transistor benefits reading “0” because of increasing drain current. Therefore, pull-down transistors, M3 and M4, are then utilized by low threshold voltage (low Vt) instead of regular threshold voltage. Moreover, longer channel length of pull-down transistors, M3 and M4, also assist LBL to be discharged for reading “0” operation.

4.1.2 Design Consideration

Depending on the different size of TCAM macro, user can adjust the length of LBL. In the ripple bit-line scheme, all the banks are arranged in cascade mode (as shown in Fig. 4.2), and the farthest bank is on the critical path to charge and discharge, so the bit allocation of LBL is chosen by the power and performance. Fig. 4.3 illustrates the power and delay comparisons of ripple bit-line scheme according to the number of TCAM cells in local bank (M). As M increases, both the area overhead and power of RBL decreases, because the number of ripple bit-line buffer decreases. Even though power dissipation reduces, delay of RBL is prolonged significantly because of

larger capacitance and increasing leakage current of LBL resulting in long slew rate.

On the other hand, if the length of LBL is too short, the number of sub banks (256/M) increases though the LBL capacitance is partitioned into small amount. Hence, bit-line power and read delay rises rapidly since number of buffers dominates the power and number of banks delay leads the delay instead of capacitance. Therefore, the competing trend in power and delay result in an intermediate bit number where total energy is minimized.

Number of TCAM Cells on Each Local BL (M)

4-bit

Fig. 4.3 Power and delay comparisons of the local bit-line scheme.

From Fig. 4.3, 16-bit TCAM cells on each local bit-line has minimal delay, but the lowest power-delay product is located at 32-bit. Generally, the bit number of local bank should be decided at 32-bit. For consideration of ripple search-line scheme (which will be discussed in section 4.2), less bits of local bank can save more power due to continuous don’t-care pattern. Also, the power consumption of 16-bit is close to 32-bit, such that the number of local bit-line is chosen to be 16-bit. By appropriate partition of bit-line length, this ripple bit-line scheme employs simple circuitry

without extra timing control to save power dissipation and additional metal layer of GBL, and further improve area efficiency to high density design.