Adaptive Approaches - 利用與資料相依之延遲改善運算單元之能量效率

This category is said that the supply voltage and switch activity/capacitance can be adjusted adaptively. It is also more flexible than the static approaches. Compared

with dynamic approaches, it can adapt to environmental conditions or data correlations. I’ll introduce the techniques of energy reduction by reducing supply voltage and switch activity/capacitance respectively.

2.4.1 Supply Voltage

z Adaptive voltage scaling (AVS)

It is a one method of dynamic voltage scaling. It can adaptively scale the supply voltage by monitoring the actual silicon speed [23][29]. Therefore, worst case characterization is no longer required.

The actual performance is monitored using on-chip structures. The frequency of the ring oscillator is sampled using a counter as shown in Figure 2-12. The frequency count is then compared to the frequency required by the system and the difference is filtered using the system’s filter. It has to be built in the ring oscillator to accommodate for all types of gates and all conditions. A better approach is to use a critical path replica as shown in Figure 2-12.

Figure 2-12 Architecture of the AVS system

2.4.2 Switching Activity and Capacitance

z Bit swapping

The most effective method to reduce the number of transitions in functional units is increasing the correlation of input data. The bit-swapping method is to change the input bit of functional unit according to the previous input bit status such that the number of signal transitions can be minimized [6].

Shown an example in Figure 2-13, the exclusive-OR gate is a selection logic that it manages the bit swapping. Previous data of in1 is 4’b0011 and in2 is 4’b1100, and the next data of in1 is 4’b0100 and in2 is 4’b1011. After bit swapping, the next data of in1 is swapped as 4’b0011 and in2 is swapped as 4’b1100.

ALU

Figure 2-13 Example of bit swapping

z Guarded evaluation

Guarded evaluation is based on placing some guarded logic, consisting of transparent latches with an enable signal, at the inputs of each block of the circuit that needs to be power managed. When the block must execute some useful computation in a clock cycle, the enable signal makes the latches transparent.

Otherwise, the latches retain their previous states and block any transition within the logic block.

In [30], it proposes a technique which is called partially guarded computation.

The technique disables a part of a circuit based on the dynamic range of input

operands. They divide a circuit into two parts – MSP and LSP – and allow only the LSP computation when the range of input operands is covered by the range of the LSP. Therefore, it can reduce unnecessary signal transitions.

z Proposed energy-efficient design

Circuit delay is strongly data dependent, and only exhibits its critical path delay for very specific data sequences [7][8][9]. Proposed design is exploiting data-dependent delay to reduce circuit energy. Shown in Figure 2-14 (a) is an example that it depicts a path delay distribution of original circuit. The x-axis represents the path delay, and the y-axis represents the number of patterns.

In this example, we assume that it is a normal distribution. Noted the distribution, delay time of most patterns is smaller than the critical path delay (clock period), and only few patterns can activate the critical path. We can attempt to optimize the common case for energy reduction based on the clock period, rather than to optimize the worst-case (critical paths) based on the clock period, shown in Figure 2-14 (b).

Therefore path delay of some paths (critical paths) may be longer than the clock period, but the circuit energy can be reduced effectively. As long as we can tolerate these critical paths, we can gain the energy reduction.

Figure 2-14 Path delay distribution

Shown in Figure 2-14 (b), there are a% of total input patterns that can not

accomplish a computation within a clock period and may cause to computing errors.

In order to tolerate the errors, all patterns that will incur computing errors are operated two clock cycles (one-cycle latency penalty). Hence we generate a

“detection logic” that is responsible for the error detection, and the circuit is augmented with the “detection logic”.

Shown in Figure 2-15, the input pattern of the detection logic is the same as the functional unit, and the output of detection logic is a 1-bit “wait” signal. If the

“wait” signal is asserted, the input patterns would be latched one more cycle and output data is not available.

Figure 2-15 Conceptual circuit of proposed design

From this scenario, although the circuit energy can be reduced, the performance may be degraded also. In order to reduce the performance penalty, the detection logic needs to exactly detect the computation errors. We can also reduce the number of violating paths to reduce performance penalty, but that also influences the effect upon the energy reduction. It has to trade-off between energy and performance. This part is the main problem I want to solve.

3 Proposed Energy-Efficient Design

Energy consumption has become a critical issue in modern VLSI designs. For the circuit energy reduction, we propose a method that trades small performance penalty for large energy reduction. In this chapter, I will introduce our proposed energy-efficient design that it consists of the CMOS circuit delay, the template of variable latency design and proposed design flow.

在文檔中利用與資料相依之延遲改善運算單元之能量效率 (頁 40-45)