Self-Calibrated Voltage Scaling Technique

The self-calibrated voltage scaling technique is proposed to reduce the operation voltage of link wires for energy reduction and guarantee the reliability at the same time. The architecture of self-calibrated voltage scaling technique is shown in Figure 4.1. It is constructed by low swing driver, level converter, voltage scaling control unit, crosstalk-aware test error detection stage and run-time error detection stage. Depending on two error detection stages, the voltage control unit adjusts the voltage swing levels of link wires.

Low Swing

Figure 4.1: The architecture of Self-Calibrated Voltage Scaling Technique

Based on self-corrected green coding scheme, the triplication error correction coding stage provides error-correct ability for link wires. The self-corrected green coding scheme allows us to decrease signal voltage swing, and at the same time, achieves the same level of word error rate of un-coded link wires. While the bit error rate varies in a range from 10^-20 to 10^-10, a 0.7V signal swing of link wires can guarantee the

reliability. According to the analysis on reliability issue of joint coding scheme, more detail of analysis will show in Chapter 5. The low swing driver and level converter are implemented with three voltage levels as shown in Figure 4.2, which are HV (Vdd), MV (Vdd – Vt), LV (Vdd – 2Vt). The PMOS diode-connected are applied to produce the low swing voltages as shown in Figure 4.2(a) by low-Vt PMOS. Three control signals, S0~S2, decide the voltage swing of link wires, and the correspondences between control signals and voltages are list as the table which is shown in Figure 4.2(a).

2

_tp

VDD − V

VDD

Figure 4.2: (a) Low Swing Voltages (b) Driver (c) Level Converter

The control police and voltage state diagram of self-calibrated voltage scaling technique as shown in Figure 4.3. Crosstalk-aware test error detection stage is triggered by T_start and crosstalk-aware test vectors, and the test results are

generated by test error detector. In the beginning, the crosstalk-aware test vectors are transmitted at the lowest voltage level of 0.7V. In view of the error correction coding, the error should be zero from test error detector. If the error detector detects any errors, the test vectors will be resent at higher voltage level (0.85V or 1V). Until the result is error free, the initial voltage swing of link wires is decided. When test is finished, T_finish will be asserted and the run-time error detection stage will be activated also.

Figure 4.3: The control police and voltage state diagram

After Crosstalk-aware Test Error Detection stage, the run-time error detection stage raises V_scale to trigger scaling mechanism within every N clock cycles window.

According to the bit error rate, the voltage control unit can further rise or fall the signal voltage swing during run-time. The bit error rate is defined as the ratio of the total transmission data in one window to the error data. If the bit error rate is less than 5%, we drop the signal voltage swing one level or stay at the lowest safe signal swing level. Else if the bit error rate is more than 5% but less than 15%, the signal voltage

swing level is the same as previous window. Else the bit error rate is more than 15%, we raise the signal voltage swing one level or stay at the highest safe signal swing level.

The range of detect bit error rate depends on the properties of self-corrected green coding scheme. If the un- coded input data is random, the probability of the forbidden pattern condition (two adjacent line switch in opposite way, ex: ↑↓ or ↓↑) of the bus coding scheme is nearly 15%.

4.3 Crosstalk-Aware Test Error Detection Stage

4.3.1 Build-In-Self-Test For On-Chip Interconnect

Design for testability (DFT) is important for future design. Many circuit designs are pre-verified to ensure functionality. However, it is impossible to consider all effects and their muti-noises during the pre-verify stage. Alternatively, build-in self-test (BIST) provides a solution for on-chip testing. BIST for on-chip interconnection testing is important due to effects such as crosstalk, switching noise, and ground bounce. These effects play dominant roles in future design due to advanced technology, lower supply voltage, and higher clock frequency.

BIST circuit is composed of test pattern generator (TPG), test error detector (TED), and control unit. Traditional TPG, such as Liner Feedback Shift Register (LFSR), generates pseudo-random pattern sequence. By changing the feedback polynomial of LFSR, it can generates different subset of maximum-length LFSR (maximum 2ⁿ-1

patterns, LFSR with primitive polynomials when test n-bits data). A 4-bit LFSR with primitive polynomials as shown in Figure 4.4.

Figure 4.4: Example of LFSR with primitive polynomials of degree 4

Techniques include: (1) The switching probability of output bits can be changed by changing associated feedback polynomial. (2) Changing the switching probability of input bit of LFSR to control the switching probabilities of its output bits and their correlation. (3) adding weighting circuits to TPG to control the correlations of its output bits [60,61]. These schemes try to change the output switching probability and approach the real data switching on interconnects. However, the pre-characterized test pattern generator design may not suitable for on-chip interconnection test.

Test patterns for on-chip interconnection need to cover different pattern transition cases. To completely test n-bits bus, it needs to cover 2ⁿ * 2ⁿ switching cases.

However, test patterns generated by LFSR based TPG needs complicated design and long testing time to achieve high error coverage. So it needs a better self-test methodology that achieves low hardware overhead, fast test time, and high error coverage. Based on these reasons, we will adopt the MAF based TPG for on-chip interconnection test.

The effect of crosstalk is significant in deep submicron interconnection. The maximal aggressor fault (MAF) [62-64] model represents six different kinds of crosstalk effects: rising speed-up (Sr), falling speed-up (Sf), rising delay (Dr), falling delay (Df), positive glitch (Gp), negative glitch (Gn) as shown in Figure 4.5.

When testing n-bits wires, there are one victim line and n-1 aggressor lines. All aggressor lines switch simultaneously to generate speed-up, delay, or glitch error on victim line.

Figure 4.5: Maximal Aggressor Fault model (a) Rising speed-up (b) Falling speed-up (c) Rising delay (d) Falling delay (e) Positive glitch (f) Negative glitch case

4.3.2 Crosstalk-Aware Test Error Detection Stage Work Mechanism & Hardware Implementation

The crosstalk-aware test error detection stage is triggered by T_start and crosstalk-aware test vectors. Depending on test vectors, therefore, the test error detector can detect the error data after error correction coding. The crosstalk-aware test vectors are generated by test pattern generator with the maximal aggressor fault (MAF) model. It is a simple pattern stream to represent six different kinds of crosstalk effects: rising speed-up (Sr), falling speed-up (Sf), rising delay (Dr), falling delay (Df), positive glitch (Gp), negative glitch (Gn). For n-bits testing wires, there are one victim line and n-1 aggressor lines. All aggressor lines switch simultaneously to generate speed-up, delay, or glitch error on victim line. MAF test vectors can achieve high error coverage. In addition, it can be considered as aggressive test and cover other pattern transition cases. To test n-bits on-chip interconnects, six type of fault model must be tested on each individual line. It needs 6n test pattern transitions to complete MAF test. The implementation of test pattern generator is shown in Figure 4.6.

The test pattern generator of MAF based self-test methodology is implemented by the finite state machine. It needs at least 8 cycles to complete six faults test on one victim line. It indicates that the test pattern generator should take 8n cycles to complete n-bit MAF test. The test time is much shorter than Liner Feedback Shift Register based test methodology. The finite state machine is triggered by T_start signal, and it generates victim line’s value, aggressor line’s value, counter reset (C_reset) and counter enable (C_enable). After each cycle (state S1 to S8) of state machine, C_enable triggers victim counter. Select decoder and output 2-to-1 multiplexer make sure the data bit

(Di) selects the correct value (victim or aggressor value) during test time. Once S8 state and victim counter’s value (C_value) is equal to n-1, testing is finished and returns to S0 state.

0 0

faults test of MAF model (b) Hardware implementation.

4.6: MAF Based Test Pattern Generator (a) 8 states comp

4.4 Run-Time Error Detection Stage

Related Work on Double Sampling T

4.4.1 echnique and

nd Double sampling data checking (DSDC) provide an effective solution.

ta to master flip-flop. The advantages of using master-slave flip-flop are:

rating frequency.

In other word, the technique can achieve a deeper pipeline stage.

the slave flip-flop to the master flip-flop only with one clock stall.

Process-variation Aware on Link Wires

When error detected by receiver side of on-chip interconnects, the data packets are retransmitted. However, the performance penalties are large cost. To overcome the problem, master-slave flip-flop (MSFF) a

Conventional master-slave flip-flop (MSFF) [65]contains a master flip-flop and a slave flip-flop. Both flip-flops work at the same frequency, but the slave flip-flop is positive triggered by a delay clock (d_clk), as shown in Figure 4.7(a). The data capture by slave flip-flop was assumed to be right. Comparing data capture by master flip-flop and slave flip-flop by XOR gate, and raising the error-flag when two data are not identical. When error occurs, control circuits stall the pipeline data flow for 1 clock and slave flip-flop resent the correct da

(1) Higher operating frequency: if the phase difference between master flip-flop’s clock and slave flip-flop’s delay clock is 0.5 clock cycle, the design can work at frequency with 1.5 times of the original design’s maximum safe ope

(2) Lower latency penalty when detected error: The data error can be corrected by resending data from

∆ t

Figure 4.7: (a) Master-slave flip-flop (b) Double sampling data checking

The DSM multisource noises are composed by crosstalk, voltage drop, ground bounce, clock skew, IR-drop and substrate noise coupling .etc. The DSDC technique can deal with multisource noise for on-chip interconnects. Different to conventional testing technique, such as ATPG methods for crosstalk. It generates worst-case test patterns which have maximal crosstalk effect. The test method claims that the circuits is function work if and only the circuits passes the worst-case test. However, only testing the worst case for each individual noise source can not guarantee correct function of the circuits.

Double sampling data checking (DSDC) proposed by [66] extends the technique to on-chip bus. Figure 4.7(b) shows the DSDC circuit. The working principle of DSDC is similar to MSFF. Sampling input data to FF1 and comparing the captured data with data after time interval∆t. FF2 is triggered by delay clock (delay clock is behind

normal clock time t∆ ). If the noise duration is shorter than time interval , the error can be detected by FF2. The time interval

∆t

∆ must be carefully designed to make t sure the DSDC mechanism work correctly.

We have found that timing delay variation of on-chip interconnect will effect the design on . The timing delay variation is caused by crosstalk effect and process variation .etc. The crosstalk effect is due to different patterns transient caused different degree of propagation delay on transmission line. The propagation delay on transmission line is also affected by process variation. The variation affects both devices and interconnects. Devices are affected by variation in effective channel length, oxide thickness and threshold voltage .etc. Wire variation is affected by thickness and width variation. In [67], author Develop a model to analyze the effects of process variation on delay in on-chip bus signaling. The overall delay variation shown in the simulation results based on different encoder/decoder schemes is nearly 80ps~180ps. In [68], author considers bus coding scheme based on process variation aware for delay minimization on interconnects. The simulation results show that the difference between maximum and minimum value of propagation delay before coding (for 5mm-line in 90nm technology) is almost 600ps. Even with process variation aware bus coding scheme the difference is reduced to nearly 200ps, but still a significant value. We have found that timing delay variation of on-chip interconnect is a value at hundreds-ps level. Our interconnect architecture is work at maximal frequency at 1GHz, the timing delay variation of on-chip interconnect can’t be ignore during our design.

∆t

4.4.2 Run-Time Error Detection Stage Timing analysis

The run-time error detection stage detects timing variations of link wires. Timing delay variations of on-chip interconnection are due to crosstalk noises, process variation, temperature variation and other noises. In order to overcome timing error, double data sampling technique and double sampling data checking (DSDC) technique have been proposed to detector timing error. However, these techniques are limited by the clock period and fixed delay line, respectively. Therefore, the run-time error detection stage is constructed by modified double sampling data checking technique with adaptive delay line as shown in Figure 4.1. In addition, it also provides the correction ability by a multiplexer.

The analysis of timing constraints and Modified Double Sampling Data Checking Circuit is shown in Figure 4.8. The Waveforms of circuit in three cases: Error free, delay error and glitch error are shown in Figure 4.8(a), (b) and (c) respectively. In order to make sure the correct functionality of the modified double sampling data checking technique, the time interval ∆t has to be set appropriately, and thus consider each pipeline stages. If the delay between DFF1 and DFF2 is over l clock cycle, it will induce error sampling data of DFF1. The maximum data path delay can be extended to 1 clock cycle plus time interval ∆t, as shown in Equation (4.1).

1 3

DFF d XOR setup clk

t + + t t + t < τ + ∆ t

(4.1)

Where tDFF is defined as the Clock to Q delay of D Flip-Flop, and td is the data path delay (path from input of low swing driver to output of level converter). tXOR is the XOR propagation delay, and tsetup is the setup time of D Flip-Flop.

D Q

Figure 4.8: Modified Double Sampling Data Checking Circuit and Waveforms (a) Error-Free (b) Delay Error (c) Glitch Error

DFF3 samples the comparison signal which compares the sampling data before DFF2 and after DFF2. In addition, DFF3 has to sample the comparison signal before the arrival of next data. Therefore, ∆t should be satisfied as Equation (4.2).

2 3 2 3

DFF XOR setup DFF d XOR setup

t + t + t < ∆ < t t + + t t + t

(4.2)

Also, the pipeline stages after the DSDC stage must satisfy basic constrain as Equation (4.3).

∆ + t t

DFF3

+ t

MUX

+ t

Decoder

+ t

setup4

< τ

clk (4.3)

According to Equation (4.1) to (4.3), the upper bound and the lower bound of time interval ∆t is derived. Depending on the appropriate time interval ∆t, the run-time error detection stage not only corrects the error data, but also provides the run-time error rate information for self-calibrated voltage scaling technique to adjust the voltage swing level of link wires.

Chapter 5 Simulation Results and Analysis

In this section, we present simulation results to demonstrate the improvement in energy and reliability by employing self-corrected green bus coding scheme. All the simulation results are based on UMC 90um CMOS technology at 1.0 V. For a 32-bit packet size, the 4:1 serialization technique transfers the phit size from 32 bits to 8 bits.

In addition, the length of wires is set as 0.8mm of metal-4 with minimum width and spacing of 0.2um. Simulation results include: (1) Error rate analysis on different error correct coding schemes : We try to tradeoff between power consumption and reliability and find out the lowest signal swing level. (2) Power analysis on different joint coding schemes : We compare the power consumption on link wires of different joint coding schemes in two signal swing level: normal (1.0v) and lowest signal swing level (based on ECC correct error ability) (3) Codec Overhead of different joint coding schemes’ encoder/decoder : We show the encoder/decoder area, encode/decode delay time and physical transfer unit size of different joint coding schemes.(4) Process-variation aware timing analysis on interconnects : We analysis the propagation delay on link wires due to wire process-variation and different transient patterns. Further, we can use the results as a guild to design and make sure the double sampling data check mechanism work correctly.

5.1 Error Rate Analysis On Different Error Correct Coding Schemes

Because error correction coding increases the reliability of on-chip interconnections, the designers we can tradeoff between the power consumption and reliability through reducing the operation voltage. Simplifying the cumulative effect of noise sources, the model assumes that a Gaussian distributed noise with voltage VN with variance σN2 is added to the signal as shown in Figure 5.1(a). Besides, the error occurring on different link lines are supposed to be independent. The bit error probability ε is given as Equation (3.3) and Equation (3.4) (Chapter3), where Vdd is the voltage swing of signal. Given the same σN2, the bit error probability is increasing by decreasing the voltage swing of signals.

Figure 5.1: (a) Model of the bit error probability ε on single link wire (b) Approximation of bit error probability ε by integration.

However, some specific error control/correct coding schemes allows us to decrease the voltage swing of signal, and at the same time guarantee the reliability. If and only if satisfy the Equation (5.1) as follow:

( ) ( ) ^ˆ

uncode ecc

P ε ≥ P ε

(5.1) Where ε is the bit error probability with full swing voltage (1.0 V), ε is the bit ^ˆ error probability with lower swing voltage. In order to obtain the lowest supply voltage for specific error correction coding under the same level reliability of un-coded code, the supply voltage can be revised as:

( ) ( )

Inverse function of Gaussian distributed function also called probit function ( )x

Φ . Probit function has been proved that the function doesn’t have primary primitive. To solve the problems, we first approximate the value of bit error probability by varying the voltage swing of signal. Integrating from -100 ~ Vdd/2 , we divide the integral range on x-axis into 0.0001(v) segment each, so each segment can produce a trapezoid .To sum up the area of all trapezoids and the results represents the approximation of bit error probability as shown in Figure 5.1(b). Therefore, the lowest voltage swing for specific error correction coding which satisfied the Equation (5.2) could be obtained.

When un-coded code is operated at full swing supply voltage (1.0v), the different level of bit error probability ε can be obtained by varying variance of Gaussian distributed function. Figure 5.2(a) and 5.2(b) show the voltages of specific error correction coding versus different un-coded word-error-rate with k = 8 and k = 32 (k

is bit width), respectively. From Figure 5.2(a), assume the bit error probability of un-code word ε equals to 10^-20, the specific voltage of Hamming code, Duplication-Add-Parity code, CADEC code and the proposed self-corrected green code are 0.705V, 0.710V, 0.579V, and 0.696V, respectively. From Figure 5.2(b), all ECC code’s lowest supply voltages increase with the increasing un-code word-error-rate. Compared to other ECC codes in Figure 5.2(a), however, the proposed S-C Green code has the better characteristic that the lowest supply voltage decreases when the un-code word-error-rate increases. When k increases, the proposed self-corrected green code can approach the lowest supply voltage of CADEC. Even obtain the lower operation voltage than CADEC when the un-code word-error-rate is over 10^-4 level ( blue circle mark in Figure 5.2(b) ).

k = 32 (bit width)

Figure 5.2: Lowest voltage of specific error correction coding versus different un-coded word-error- rate with (a) k = 8 (b) k = 32 respectively.

5.2 Power Analysis On Different Joint Coding Schemes and Codec Overhead

Figure 5.3(a) shows the energy reduction to un-coded code under different values of λ and under normal signal swing level (1.0V). Compared to previous different joint coding schemes as shown in Table 5, such as Hamming Code (HC), FTC+HC, FOC+HC and One Lambda Code (OLC)+HC, Boundary Shift Code (BSC) and DAP+shielding (DSAP) [14,17], CADEC in [51], the proposed self-corrected green bus coding (S-C green) can achieve the most energy reduction no matter which value of λ is.

Category Coding

Table 5: Different combination of joint coding schemes

We can further lower the signal swing level of specific codes to its’ lowest value

在文檔中應用於晶片網路之低功率高可靠度傳輸架構基於自我更正節能編碼技術和自我校準電壓調整技巧 (頁 56-93)

Self-Calibrated Voltage Scaling Technique

2

VDD − V

VDD − V

VDD

4.3 Crosstalk-Aware Test Error Detection Stage

4.3.1 Build-In-Self-Test For On-Chip Interconnect

4.3.2 Crosstalk-Aware Test Error Detection Stage Work Mechanism & Hardware Implementation

0 0

4.4 Run-Time Error Detection Stage

Related Work on Double Sampling T

4.4.1 echnique and

Process-variation Aware on Link Wires

∆ t

4.4.2 Run-Time Error Detection Stage Timing analysis

t + + t t + t < τ + ∆ t

t + t + t < ∆ < t t + + t t + t

∆ + t t

+ t

+ t

+ t

< τ

Chapter 5

Simulation Results and Analysis

5.1 Error Rate Analysis On Different Error Correct Coding Schemes

( ) ( ) ˆ

P ε ≥ P ε

( ) ( )

k = 32 (bit width)

5.2 Power Analysis On Different Joint Coding Schemes and Codec Overhead

Category Coding

( ) ( ) ^ˆ