• 沒有找到結果。

A simple mesh architecture of Network on Chip is shown in Figure 1.5. This research focuses on physical layer and data-link layer of NoC protocols. The goal of this research is to achieve a low latency, low power and reliable interconnect architecture. The architecture will be applied to each transmission stage between two adjacent switchs. The design of NoC protocols should consider each stage properties together to achieve better performance. Based on this concept, we adopt serialization technique to implement packet-based transmission, which is the most significant difference of Network-on-Chip architecture to other on-chip bus approachs.

PE

Switch Link wires Network interface

Figure 1.5: A simple architecture of Network on Chip

The traditional rail-to-rail voltage signaling will be no longer suitable for low power interconnect design. Reduced signal swing can significantly reduce power consumption on link wires. However, as the technology trends lead us to smaller voltage swings and larger coupling capacitances effect, it means the interconnect will be more suspect to muti-noise sources in future. It is important to carefully design and tradeoff each performance metrics. To guarantee the reliability is considered a necessary and important issue especially in future nanometer design.

Furthermore, reliability can be traded off for energy. According to this concept, we want to save more energy on link wires based on guaranteed reliability bound. The whole architecture should tolerant to process-variation, to make sure the circuits functional work in different cases.

1.4 Organization of Thesis

The rest parts of thesis are organized as follows: In Chapter 2, we mention the background of nanometer interconnects. Two high performance signaling modes are introduced. The low power design concept of on-chip interconnect is introduces.

Finally, we mention the advantage of using serialization technique for link wires. In Chapter 3, a unified framework of joint coding scheme concept is introduced and some previous related work of different coding schemes is listed. The proposed self-corrected green coding scheme based on triplication bus model is presented in

this Chapter also. In Chapter 4, a self-calibrated voltage scaling technique for reliable Interconnections in Network-on-Chip is proposed. The concept of high error coverage and low test overhead MAF-based test mechanism and process-variation aware modify double sampling data checking technique are introduced. According to these schemes, we show how the voltage scaling technique able to tradeoff energy and reliability. In Chapter 5, simulation result and related analyses is presented. Compare to other coding approach on energy saving to un-coded word and error-rate analysis.

Also the process-variation aware on link wires in different transmission cases is presented. Finally, conclusion is summarized in Chapter 6.

Chapter 2 Background

2.1 Nanoscale interconnect

According to ITRS prediction illustrated in Figure 2.1, the gate between the interconnection delay and the gate delay will increase to 9:1 with the 65nm technology [36]. Increasing of power dissipation consumed in charging and discharging the interconnect wires on a chip. Soon the interconnect will domain the performance such as power consumption, speed, and area. It means that interconnection will affect the system more in future SoC design rather than logic circuits on a chip.

Figure 2.1: Relative delay comparison of wires versus process technology

New design approach like network on chips (NoC) provides a more process independent interconnection architecture. However, we still need to understand more detail physics effect in the on-chip interconnect of Deep-Sub-Micron technology.

Four factors are referred to cause new challenges in DSM [37]: (1) Increasing of operation frequency so that the inductance effect can’t be ignored anymore. (2)Signal reflection and transmission due to impedance mismatch. (3)Increasing of fringe capacitance due to increased aspect ratio of the interconnection wire. (4) Increasing of resistance due to skin effect of high frequency and contact resistance.

The signal integrity is an important issue for on-chip interconnect. The degradation comes from many different source, intrinsic RLC circuit nature and extrinsic noises.

Intrinsic RLC circuit nature such as RC delay, LC ringing, attenuation of wave amplitude .etc. Extrinsic noises such as crosstalk noise inter-symbol interference (ISI ). The wave overlapping cause ISI and leads to higher error rate on interconnect.

The most significant noise in DSM is crosstalk noise. Through the coupling capacitance, the delay and skew of victim line signals can be affected by adjacent lines (aggressor lines). Crosstalk noise makes difficult to estimate the exact delay deviation.

Interconnect capacitance/resistance became comparable to gate in today’s technology. Significant inductance effects with technology scaling in the future. In DSM, the inductance effect of wires becomes a noise source. When aggressor lines are switching simultaneously, the filed electromagnetic may induces noise on the victim lines. The effect of inductance can be neglected if f << R/(2πL) in most case of on-chip interconnect. According to the simulation results of [38], the inductance effects will change the worst case (delay consideration) transition for nano-scale interconnect wires only when the operation frequency is over 1GHz. We will ignore the inductance effect of on-chip interconnect in this thesis.

2.2 High Performance Signaling

2.2.1 Voltage-mode & Current-mode Signaling

Generally speaking, there are two modes for high performance signaling : Voltage-mode signaling and Current-mode signaling. As shown in Figure 2.2(a), voltage-mode signaling using CMOS inverters as driver and receiver. The load capacitances of wire are charged/discharged by driver. The receiver is active until voltage of receiver’s input node have been charged/discharged to full-level swing.

The voltage-mode signaling is a simple and conservative way of signaling. No static current dissipation, so the power consumption of voltage-mode signaling is proportional to the switching activity of signals. However, driving longer interconnect needs larger driver size, the related work such repeater insertion technique have been proposed to solve the problem. Some paper discussed the optimal numbers and size of repeaters and the optimal wire length and wire segments [39-41]. Try to tradeoff between performance metrics, such as delay, power .etc.

Current-mode signaling circuits as shown Figure 2.2(b).With the same driver circuits as voltage-mode signaling, the current charge the load capacitance of wire.

Current flows through load resistance Mp and Mn. The receiver don’t have to wait the input node of it charged to VDD, it can detect the small voltage at the load resistance changed by the current flow. The current-mode signaling is faster than voltage-mode signaling, because it doesn’t need a full swing level charge/discharge to decide logic value. The maximum data rate of a current-mode signaling is about twice of voltage-mode signaling if two schemes have same number of repeaters. When the load capacitance is large, it seems that the current-mode signaling can achieve better power efficient than voltage-mode signaling. Major portion of power consumption in

current-mode signaling is due to static current ( I2 , as shown in Figure 2.2(d) ) through the wires. If the switching activity of signal is 0.5, current-mode signaling can save about 25% power from voltage-mode signaling. However, if the switching activity of signal is less than 0.4, the voltage-mode signaling is better than current-mode signaling on power saving. Because current-mode signaling don’t charge to full-level swing, the scheme is susceptible to noise sources.

I1

V+1

V2+

Mp

Mn

0

Static current (I2)

V1

I1

V2

I2

(a)

(b)

(c)

time

(d)

time

Figure 2.2: (a)Voltage-mode (b)Current-mode signaling circuits

(c)Voltage waveforms (d)Current waveforms of two type signaling circuits.

2.2.2 Low Power Interconnect Design

Low power design been emphasized in future design. The power consumption of on-chip interconnect can be simply described as :

P = × α C

w

× V

swing

× VDD

driver

× f

(2.1)

where α is the switching activity of signal, Cw is the wire capacitance , Vswing is the voltage swing on the wire, VDDdriver is the supply voltage of the driver and f is the signaling frequency. To design a lower power interconnect, we can minimize each of parameters ( α, Cw , Vswing , VDDdriver , and f ) in Equation (2.1).

(1) Reducing the switching activity

Switching activity α can be reduced by coding schemes. Low power codes (LPC) are first proposed to achieve the goal, such as bus-invert (BI) code [42], partial BI code [43], T0-code [44] and Gray-code [45]. Gray-code and T0-code reduce the switching activity of address bus effectively because the data in the address bus is sequential, But these two schemes is not suitable for data bus. However, these schemes just consider self transitions but ignored coupling capacitances effect.

Crosstalk avoidance codes (CAC) that reduce both self and coupling transitions by forbidding specific transitions, such as Forbidden Overlap Codes (FOC), Forbidden Transition Codes (FTC), Forbidden Pattern Codes (FPC) and One Lambda Codes (OLC) [46-48].

(2) Reducing wire capacitance

In DSM technique, the coupling capacitance will domain the value of equivalent total capacitance. Simply way to reduce wire capacitance techniques is widening the spacing between two adjacent wires. To minimize the wiring area, on-chip serialization technique is an effective way, which will be discussed in Section 2.3.

(3) Reducing voltage swing of signal

Low-swing signal and supply voltage can reduce the power consumption of on-chip interconnect significantly. The swing level of signal must carefully design because it is related to signal noise immunity. Many low-swing schemes was discussed in [], we will introduced some receiver circuits and receiver circuit following.

Four types of low-swing driver as shown in Figure 2.3 . Driver circuits as shown in Figure 2.3(a) and Figure 2.3(b) are able to reduced power to (VDDL/VDD)2, where VDDL is additional reference voltage source. The additional reference voltage source is produced by off-line DC-DC convert. Driver circuit as in Figure 2.3(c) has a signal voltage swing equal to by a voltage drop Vt of transistor. Driver circuit as in Figure 2.3(d) use pulse to control the charge/discharge time on wire load capacitance, the signal voltage swing is equal to . But the value of wire load capacitance is hard to be estimated during design stage. The advantage of driver circuits as shown in Figure 2.3(c) and Figure 2.3(d) is they don’t need extra reference voltage source, but with the disadvantage of the circuits are susceptible to process variation and need re-design in different technology node.

th_N th_P

VDD-V -V

*

pulse

/

I t Cw

Figure 2.3: Four types of low swing driver circuits (a) Conventional (b) NMOS pull-up transistor (c) Transistor Vth drop (d) Pulse-controlled

The low swing receiver design can be implemented in two ways: single-ended conventional level converter as shown in Figure 2.4(a) and differential amplifier as shown in Figure 2.4(b). Differential signaling has better noise immunity, but requires double the numbers of wire. In [49], the author consider different receivers in respect of energy, delay, signal swing level, signal-to-noise ratio and complexity. The driver/receiver circuit implemented to on-chip network should as simple as possible.

We adopt conventional type of driver circuit and single-ended level converter as receiver. Without an additional reference voltage source, the VDDL is produced by CMOS diode-connected voltage drop scheme, more detail will be introduced in Chapter 4. Maybe the different type of driver/receiver circuits can achieve better performance in specific metric, but it isn’t the key point in this thesis.

Figure 2.4: Two types of low swing receiver circuits (a)Single-ended level converter (b) Differential amplifier

2.3 Serialization Technique For Link Wires

The physical transfer unit is a unit into which a packet is divided and transmitted through micro-network. Simply speaking, the phit size is the bit-width of the link wire, I/O and switch size. Large phit size increases network area and energy consumption, especially for switching circuit and buffering units in switch fabrics. Some approaches address signal integrity to protect the NoC interconnection infrastructures against different transient malfunctions [50,51]. However, these approaches could not decode the encoded codes in each switch fabric because of significant delay. The critical depth, moreover, will increase rapidly as well as the bit-width increases. Therefore, the un-decoded code will induce great amount of area and energy dissipation of switching circuits and buffers in switch fabrics.

Joint coding schemes have been consider the effective way to reduce power consumption and at the same time provide a reliable interconnect. However, both

crosstalk avoidance codes and error correction codes enlarge the physical transfer unit (phit) in network-on-chip. According to the disadvantages mention above, we can joint bus and error correction coding scheme with concept of serialization and deserialization technique. Figure 2.5 show a K-to-N serialization with N:1 ratio.

Serialization ratio is defined as I/O bit width divided to phit size. Original data without serializer was sent K bit each cycle, but was sent K/N bit each cycle with serializer. The serializer and deserializer reduce the phit size and further reduce the area and energy consumption of the switch fabrics. However, in order to achieve the same throughput, the serialization technique will increase the operation frequency of interconnection network. On-chip serialization, nevertheless, is a crucial technique for NoC implementation. It reduces overall network area and optimizes power consumption which is well-explained in [52,53].

Figure 2.5: K-to-N serialization with N:1 ratio

Figure 2.6(a) and 2.6(b) are simulated under high loading and low loading of wires, respectively. Despite of loading, the power consumption decreases with the increasing ratio of serializer under low operation frequency. Unfortunately, with he increasing ratio of serializer under higher operation frequency, the power consumption increases because of large driver to provide high driving ability.

Figure 2.6: Average power versus different ratio of serializer and frequency (a) in high loading (b) in low loading of wires

Figure 2.7 shows that the switch and link energy consumption decrease effectively depend on serialization ratio. But queuing buffer energy consumption increase positive proportion to serialization ratio. From [53] the simulation results, 4:1 serialization ratio is an optimized ratio to achieve energy saving for network on-chip interconnect.

Figure 2.7: Energy variation in relation to serialization ratio when the number of processing units (N) = 16 under Mesh and Star NoC topology [53].

We implemented the serializer and deserializer with all-digital self-calibrated multi-phase delay-locked loop in [54]. The shift register based serializer/deserializer architecture was adopted in this thesis, implemented by low swing edge-triggered Flip-Flop. 4-to-1 Serializer circuit and waveform as showed in Figure 2.8(a), and 4-to-1 deserializer circuit and waveform as showed in Figure 2.8(b). The data can operate in quarter of clock frequency. Reducing operation frequency can achieve power saving goal.

Figure 2.8: (a) Shift Register Based Serializer and waveform (b) Shift Register Based Deserializer and waveform

Chapter 3

Self-Corrected Green Coding Scheme

3.1 Preliminary

Joint coding schemes based on the unified framework provide better communication performance. However, the schemes mention above just combine different kinds of codes directly. The intrinsic qualities of crosstalk avoiding coding and error correction coding are mutually exclusive, except for duplicate-add-parity (DAP). The previous works have disadvantages of encoder/decoder hardware overhead, encoder/decoder need a significant propagation delay when numbers of un-coded bit increase. Besides, without the serialization and deserialization technique for link wires, large phit size will increase network area and energy consumption. To achieve a low-power and reliable interconnect, we propose a joint bus and error correction coding scheme with 4-to-1 serializers and deserializer as Figure 3.1 .

Processor Element interfaceinterface ECC decoderECC encoder

Processor Element

Figure 3.1: A joint bus and error correction coding scheme with serializers/deserializer in network-on-chip

In this chapter, we focus on the joint bus and error correction coding scheme, self-corrected green coding scheme. To realize reliable and green interconnection for NoC platforms. Self-corrected green coding scheme is constructed by two stages, which are green bus coding stage and triplication ECC stage. The green bus coding is developed by the joint triplication bus power model to achieve more energy reduction for the triplication ECC. The detail of self-corrected green coding scheme will be described in the following section. It has the characteristics of shorter delay for ECC, more energy reduction and smaller area.

3.2 A Unified Framework of Joint Coding Scheme

For on-chip interconnection, three main problems have to been considered, which are delay, power and reliability. For the delay problem, large propagation delay due to capacitive. Especially long global line, low swing voltage to charge capacitive take a long time. High power consumption of interconnects is due to both parasitic and coupling capacitance. Finally, reliability depends on increased susceptibility to errors due to noise. In advanced technologies, circuits and interconnects become more sensitive to noises as to the lower operation voltage. In addition, the increasing coupling noise, soft-error rate, bouncing noise decrease the reliability also. In view of this, self-calibration circuitry is essential in today’s SoC design. Therefore, coding theory is an effective solution to deal with the three challenges. Joint bus and error correction coding has been an elegant and effective technique to solve the crosstalk effect and further provides a reliability bound for on-chip interconnect.

According to different problems, there are different coding schemes to deal with:

(1) LPC (Low-Power Codes): Reducing transition activity to achieve low power interconnect.

(2) CAC (Crosstalk Avoidance Codes): Avoid specific code patterns or code transitions to reduce delay and power dissipation produced by crosstalk effect.

(3) ECC (Error Control Codes): To guarantee error-free transmission, the code has to provide a reliability bound. The code is able to detect or correct the error bits.

LPC and CAC are hard to separate completely, because they have some similar properties. Sometimes avoid crosstalk between lines will also lower the power consumption. To briefly sum up, LPC and CAC Reducing transition activity and forbidding some transitions which cost large power.

Joint codes architecture have been proposed in [14]. An unified coding framework as shown in Figure 3.2, it’s rules are:

(1) CAC needs to be the outermost code (2) LPC can follow CAC

(3) ECC needs to be systematic

(4) The additional information bits generated by LPC (p) and ECC (m) need to be encode through linear crosstalk code (LXC1/LXC2)

Figure 3.2: Unified coding framework [14]

3.2.1 Related Work On Crosstalk Avoidance Codes

Crosstalk avoidance codes (CACs) can be used to improve signal integrity and also reduce the coupling capacitance effect and hence the reduce energy dissipation of wire segments. CACs reduce the worst-case switching patterns of a wire by ensuring that transition from one codeword to another codeword does not cause adjacent wires to switch in opposite directions . According to the analysis in [18] for the specific case of on-chip buses, the bus lines must be 20mm longer in order for these encoding schemes to be energy efficient in practical implementations. Due to the NoC design, the wire segments between two routers or between router and IP are significantly shorter than the above mentioned limit. [55].

The purpose of crosstalk avoidance code is to reduce the delay of the line to (1+pλ)τ, where p = 1,2 or 3 depend on the maximum coupling (worst case delay (1+4λ)τ ). The following consider four CACs: Forbidden Overlap Codes, Forbidden Transition Codes, Forbidden Pattern Codes and One Lambda Codes.

These CACs achieve different degrees of delay reduction.

First, we define three conditions which help us to analysis the switching activity of codeword, named Forbidden Overlap condition, Forbidden Transition condition and Forbidden Pattern condition. Forbidden Overlap condition represents a codeword transition from 010 to 101 or from 101 to 010 as shown in Figure 3.3(a).

Forbidden Transition condition represents a codeword transition from 01 to 10 or from 10 to 01 as shown in Figure 3.3(b). Forbidden Pattern condition represents a codeword having 010 or 101 patterns.

Figure 3.3: (a) Forbidden Overlap condition (b) Forbidden Transition condition

(1) Forbidden Overlap Codes (FOC)

Maximum coupling can be reduced to p=3. The FOC can be satisfied if and only if a codeword having the bit pattern 010 (or 101) does not make a transition to a

Maximum coupling can be reduced to p=3. The FOC can be satisfied if and only if a codeword having the bit pattern 010 (or 101) does not make a transition to a

相關文件