Chapter 1 Introduction
1.3 Organization
The organization of this thesis is as follows. An overview of CAM is introduced in Chapter 2. Here, a conventional CAM architecture including CAM cells and CAM word schemes would be presented. Besides, the application and prior arts of CAM would be described in this chapter as well.
The noise-tolerant butterfly match-line scheme is realized in Chapter 3. The proposed butterfly match-line scheme with XOR-based conditional keeper [17]
achieves the search time reduction and search power at the same unity noise gain margin compared to conventional AND-type match-line scheme. Furthermore, at the same search time criteria, the AND-type match-line scheme not only has better performance but also saves area compared to conventional AND-type match-line scheme.
Don’t-care based low power technique is proposed in Chapter 4. Although there does have area overhead, with increasing bits of match-line scheme, the overhead is becoming acceptable. By use the characteristic of addresses of IPv6, match-line precharge circuit can be turned off according to numbers of don’t-care bits in one search stage scheme on match-line. Furthermore, hierarchical search-line scheme controlled by don’t-care bits are proposed to reduce switching capacitances on search-line. As a result, power consumption can be reduced.
An energy-efficient ternary CAM array is implemented in Chapter 5. In this chapter, we reduce the switching capacitances and switching activities to decrease dynamic power consumption. Butterfly match-line scheme which decrease search
delay is also utilized. A novel AND-type match-line scheme which combines butterfly match-line scheme, don’t-care based low power technique, XOR-based conditional keeper, pseudo-footless clock and data pre-charge dynamic (PF-CDPD) circuit [18] is proposed. TCAM array composed of the proposed AND-type match-line scheme has shorter search time and less search power consumption. Moreover, there is only a little area overhead about the proposed TCAM scheme. Finally, the overall investigation results and conclusions will be discussed in Chapter 6.
Chapter 2
An Overview of Content Addressable Memory
In this chapter, it introduces the overview of CAM. The conventional CAM architecture, CAM cell circuits, and CAM word schemes would be described in Section 2.1, Section 2.2, and Section 2.3, respectively. In addition, the applications of CAM would be detailed in Section 2.4. Finally, Section 2.5 would give an introduction for low power content-addressable memory design.
2.1 Conventional CAM Architecture
A conventional CAM architecture is usually composed of the data memories, address decoders, bit-lines pre-charge circuits, word match schemes, read sense amplifiers, address priority encoders and so on [19]-[21]. Fig. 2.1 shows a simplified block diagram of a CAM. Generally, CAM has three operation modes: write, read, and search. In write and read operation, CAM plays just like an ordinary memory.
That is to say, data is manipulated in the CAM array as the same way in SRAM array.
Different from SRAM, CAM has a special mode: search mode. The input in Fig. 2.1 called search word that is broadcast onto the search-lines to the table of stored data.
The number of bits in a CAM word is usually large, with existing implementations ranging from 36 to 144 bits. A typical CAM employs a table size ranging between a few hundred entries to 32K entries, corresponding to an address space ranging from 7 bits to 15 bits. Each stored word has a match-line that indicates whether the search word and stored word are identical (the match case) or are different (a mismatch case, or miss). The match-lines are fed to an encoder that generates a binary match location corresponding to the match-line that is in the match state. An encoder is used in systems where only a single match is expected. In CAM applications where more than one word may match, a priority encoder is used instead of a simple encoder. A priority
encoder selects the highest priority matching location to map to the match result, with words in lower address locations receiving higher priority. The overall function of a CAM is to take a search word and return the matching memory location. One can think of this operation as a fully programmable arbitrary mapping of the large space of the input search word to the smaller space of the output match location.
Address Input
Address Output
Address Decoder Memory Cell Array
n words x m bits
Bit Line Prechargers
Word Match Circuits Address Priority Encoder
Data
Input Data Lines
Read Sense Amps.
Data Output Read enable
Write enable
Search word Reset
CLK
Fig. 2.1 Conventional CAM Architecture
2.2 Conventional CAM Cell
In this section, a conventional CAM cell would be introduced. A CAM cell serves two basic functions: bit storage (as in RAM) and bit comparison (unique to CAM). There are two types of CAM cells would be introduced as following: one is binary CAM (BCAM) cell and the other is ternary CAM (TCAM) cell.
2.2.1 Binary CAM Cell
Depending upon working different methods in search mode, CAM2 cells are classified into two kinds: NOR-type CAM cell and AND-type3 CAM cell [22]. The differences of them would be described as follows.
2.2.1.1 NOR-type CAM Cell
Fig. 2.2 depicts the NOR-type CAM cells which are widely used for CAM scheme design in past years. Fig. 2.2 (a) is constructed by 9-transistor structure and Fig. 2.2 (b) is composed of 10-transistor structure. Table 2.1 shows the truth table of a NOR-type CAM cell. The 9-transistor CAM cell consists of a traditional 6-transistor SRAM and a PTL-type compare circuit; the 10-transistor CAM cell is composed of an ordinary 6-transistor SRAM and the pull down XOR comparison circuits. As the CAM cell is to be written, not only 9-transistor CAM cell but also 10-transistor CAM cell work same as a SRAM cell. While word-line is active, the complementary data is forced onto the bit-lines to be stored in the D-latch which is composed of two inverters. In read operation, bit-lines will be pre-charged to high first and whether the bit-lines discharge to ground or not depends on stored data. After passing the read sense amplifier, the correct data is sent to the output stage. About the 9-transistor, the match-line will be charged to high first in the search operation. If search data is equal to the stored data, the node X becomes low. Furthermore, the NMOS, Mn, is turned off, and the match-line is still floating. On the other hand, if search data doesn’t match with stored data, the node X would become high and result in the NMOS, Mn, being turned on. Therefore, the match-line would be discharged to ground. Regarding 10-transistor CAM cells, the principle is same as 9-transistor CAM cells. During searching operation, the match-line would be pre-charged to high first. If searching data is equal to the stored data, the match-line is still floating. Contrarily, if searching data is not equal to the stored data, there is a path from match-line to ground and match-line would be discharged to ground through this path.
2 In this thesis, we call binary CAM as CAM simply. If we mean the ternary CAM, we will call TCAM specially.
3 Because the principles and operations of AND-type cells like ones of NAND-type cells, we all call NAND-type match-line scheme as AND-type match-line scheme in this thesis.
ML WL
X
SL
BL/ BL/SL
Qi
Qj
WL
SL
BL/ BL/SL
ML
Qi Qj
Fig. 2.2 NOR-type binary CAM cell. (a) 9-transistor BCAM cell and (b) 10-transistor BCAM cell.
Table 2.1 Truth table of NOR-type binary CAM cell.
State Qi SL ML
0 0 floating Zero (0)
0 1 0 1 0 0 One (1)
1 1 floating
2.2.1.2 AND-type CAM Cell
An AND-type CAM cell is similar to 9-transistor CAM cell whatever it works in write or read operation. The only one difference from 9-transistor CAM cell is the match-line scheme. Fig. 2.3 depicts an AND-type CAM cell and Table 2.2 describes the truth table of AND-type CAM cell. As an AND-type CAM cell works in search operation, the match-line would be pre-charged to high first. However, contrary to 9-tansistor NOR-type CAM cell, the match-line hold floating when the search data doesn’t match with stored data and the match-line is discharged to ground only while the search data and stored data are match.
ML WL
BL/SL BL/SL
Qi Qj
Fig. 2.3 AND-type 9-transistor binary CAM cell.
Table 2.2 Truth table of AND-type binary CAM cell.
State Qi SL ML
0 0 0 Zero (0)
0 1 floating 1 0 floating One (1)
1 1 0
2.2.2 Ternary CAM Cells
For the CAM circuit design, the ternary CAM (TCAM) performs a more powerful data search function [23]-[24]. Different from binary CAM which has two states: one (1) and zero (0) state, the ternary CAM (TCAM) cell has an additional state: don’t care (X) state. Alike binary CAM cell, TCAM would be classified into two kinds: NOR-type TCAM cell and AND-type TCAM cell. Both of them would be introduced in following sections.
SL
DL/ DL
ML
Qj
WL
BL BL/SL
Qi
Fig. 2.4 Static NOR-type ternary CAM cell.
Table 2.3 State assignments and truth table for static TCAM cell.
State Qi Qj SL ML
0 1 0 floating Zero (0)
0 1 1 0 1 0 0 0 One (1)
1 0 1 floating 0 0 0 floating Don’t care
(X) 0 0 1 floating
1 1 0 — Not allowed
1 1 1 —
2.2.2.1 NOR-type TCAM Cell
Fig. 2.4 shows a static NOR-type TCAM cell. It consists of 2-SRAM and 4-transistor comparison circuits. This TCAM cell is designed to store three states, namely zero (0), one (1) and don’ care (X). These three states are set by Qi and Qj. Table 2.3 illustrates how the three states are stored in this TCAM cell and the truth table of the static NOR-type TCAM cell. When Qi is low and Qj is high, the TCAM cell is in the “zero” state. In the searching operation, the same as BCAM cell, match-line will be charged to high first. If search data is low, the NMOS M1 and M4 would not be turned on, such that the ML will still be floating. On the other hand, while search data is high, the NMOS M1 and M2 are turned on at the same time result
state, while search data is high, the match-line would keep high. While search data is low, the match-line would be discharge to the ground. Particularly, while Qi and Qj are both low, the TCAM cell is in “don’t care” state. No matter search data is high or is low, the NMOS M1 and M3 are not turned on result in the match-line keeping floating.
Note that Qi and Qj cannot be high simultaneously, this state are not be allowed.
There is an additional dynamic NOR-type TCAM cell is called dynamic TCAM cell [25]-[29], as shown in Fig. 2.5. The major difference between static TCAM cell and dynamic TCAM cell is that the storage memories composed of 2 SRAM cells in static TCAM cell are replaced by 2 capacitances in dynamic TCAM cell. The dynamic TCAM cell works like static TCAM and Table 2.3 also shows how these three states are stored in this dynamic TCAM cell and the truth table of the dynamic TCAM cell.
Qj
Qi
/SL BL SL
BL/
ML WL
Fig. 2.5 Dynamic NOR-type ternary CAM cell.
2.2.2.2 AND-type TCAM Cell
Fig. 2.6 illustrates a 16-transistor AND-type TCAM cell which includes 2-SRAM and comparison circuits composed of three NMOS. The state assignments and truth table of this TCAM cell is described in Table 2.4. The AND-type TCAM cell is alike a 9-transistor AND-type BCAM cell when TCAM cell works in zero (0) and one (1) states. However, while this AND-type TCAM cell is in don’t care (X) state (Qj
is high), no matter the search data is high or low, the match-line would be discharged.
WL
BL/SL BL/SL
DL DL
ML Qi
Qj
Fig. 2.6 AND-type ternary CAM cell.
Table 2.4 State assignments for TCAM cell.
State Qi Qj SL ML
0 0 0 0 Zero (0)
0 0 1 floating 1 0 0 floating One (1)
1 0 1 0 0 1 0 0 0 1 1 0 1 1 0 0 Don’t Care (X)
1 1 1 0
2.3 Match-line Structure
In the conventional CAM architecture, the circuit design of CAM word circuits adopts dynamic CMOS circuits to improve data matching performance and hardware
cost. Applying the dynamic CMOS circuits designs, the conventional NOR-type CAM word schemes and AND-type match-line schemes are shown in Fig. 2.7 (a) and Fig.
2.7 (b), respectively [30].
2.3.1 NOR-type Match-line
Fig. 2.7 (a) depicts, in schematic form, how NOR cells are connected in parallel to form a NOR match-line, ML. While we show binary cells in the figure, the description of match-line operation applies to both binary and ternary CAM. A typical NOR search cycle operates in three phases: search-line precharge, match-line precharge, and match-line evaluation. First, the search-lines are precharged low to disconnect the match-lines from ground by disabling the pull down paths in each CAM cell. Second, with the pull down paths disconnected, the Mpre transistor precharges the match-lines high. Finally, the search-lines are driven to the search word values, triggering the match-line evaluation phase. In the case of a match, the ML voltage, VML stays high as there is no discharge path to ground. In the case of a miss, there is at least one path to ground that discharges the match-line. The match-line sense amplifier (MLSA) senses the voltage on ML, and generates a corresponding full-rail output match result. We will see several variations of this scheme for evaluating the state of NOR match-lines in Section III. The main feature of the NOR match-line is its high speed of operation. In the slowest case of a one-bit miss in a word, the critical evaluation path is through the two series transistors in the cell that form the pull down path. Even in this worst case, NOR-cell evaluation is faster than the NAND case, where between 8 and 16 transistors form the evaluation path.
2.3.2 AND-type Match-line
Fig. 2.7 (b) shows the structure of the AND match-line. A number of AND CAM cells are cascaded to form the ML (this is, in fact, a floating node, but for consistency we will refer to it as ML). For the purpose of explanation, we use the binary version of the AND cell, but the same description applies to the case of a ternary cell. On the right of the figure, the precharge pMOS transistor, Mpre sets the initial voltage of the ML to the supply voltage. Next, the evaluation nMOS transistor, Np, turns ON. In the
case of a match, all nMOS transistors are ON, effectively creating a path to ground from the ML, hence discharging ML to ground. In the case of a miss match, at least one of the series nMOS transistors is OFF, leaving the ML voltage high. The AND match-line has an explicit evaluation transistor, Np, unlike the NOR match-line, where the CAM cells themselves perform the evaluation.
There is a potential charge-sharing problem in the AND match-line. Charge sharing can occur between the ML and the intermediate nodes. For example, in the case where all bits match except for the leftmost bit in Fig. 2.7 (b), during evaluation there is charge sharing between the ML and nodes Ndnn-1 through Ndn1 . This charge sharing may cause the ML voltage to drop sufficiently low such that the output inveter detects a false match. A technique that eliminates charge sharing is to precharge high, in addition to ML, the intermediate match nodes. This procedure eliminates charge sharing, since the intermediate match nodes and the ML node are initially shorted.
However, there is an increase in the power consumption. A feature of the AND match-line is that a miss stops signal propagation such that there is no consumption of power past the final matching transistor in the serial nMOS chain. Typically, only one match-line is in the match state, consequently most match-lines have only a small number of transistors in the chain that are ON and thus only a small amount of power is consumed. Two drawbacks of the AND match-line are a quadratic delay dependence on the number of cells, and a low noise margin.
2.4 Applications of CAM
CAMs are widely used in cache memory system and translation look-aside buffer (TLB) in virtual memory system in past years. The primary commercial application of CAMs today is to classify and forward Internet protocol (IP) packets in network routers [31]-[36]. In networks like the Internet, a message such an as e-mail or a Web page is transferred by first breaking up the message into small data packets of a few hundred bytes, and, then, sending each data packet individually through the network. These packets are routed from the source, through the intermediate nodes of the network (called routers), and reassembled at the destination to reproduce the original message. The function of a router is to compare the destination address of a packet to all possible routes, in order to choose the appropriate one. A CAM is a good choice for implementing this lookup operation due to its fast search capability.
2.4.1 Cache Memory
In the memory hierarchy system, cache plays an important role [37]-[38]. Cache is the name given to the first level of the memory hierarchy encountered once the address leaves the CPU. Its function is used to refer to any storage managed to take advantage of locality of access. Cache serves as a method for providing fast reference to recently used portion of instruction or data. When CPU finds a wanted data item in the cache, it is called cache hit. On the contrary, if CPU does not find a data item that is needed in the cache, it is called cache miss. Temporal locality means that the requested data item is likely needs it again in the near future, so it is useful to place the requested data item in the cache where it can be accessed quickly. A fixed-size collection of data items which contains the requested data item is called block. There is high possibility that the other data items in the block will be needed soon for spatial locality.
An example for direct data mapping cache is illustrated in Fig. 2.8. The address has 32 bits, and it is divided into three parts. First one part is byte offset which occupies two bits. Second part is Index, and third part is Tag. The numbers of Index can tell us the capacity of cache. If there are N bits for Index, the cache has 2N entries which can be stored data items. The action is first to find the corresponding position
of index. When the corresponding position is found out, the tag stored in the corresponding position would be taken out. This tag would be compared to the third part of tag. If they are the same, and valid bit is one, a hit signal and the corresponding data would be sent out. Of course, the tag entries are composed of CAM array. The valid bit is used to indicate whether an entry contains a valid address or not. If they are not the same, a miss occurs. It means that no requested data in the cache. The wanted data may be stored in the lower level memory. When the wanted data is found in the lower level, it would be written back to the cache.
Fig. 2.8 A simple cache memory.
2.4.2 Translation Look-aside Buffer
Translation look-aside buffer (TLB) is widely used to virtual memory system. A TLB is like a cache that hold only page table mapping [37]-[38]. Its function is to
Translation look-aside buffer (TLB) is widely used to virtual memory system. A TLB is like a cache that hold only page table mapping [37]-[38]. Its function is to