• 沒有找到結果。

在無串擾位元變化編碼之指令匯流排中利用暫存器重新標記以減少匯流排的耗電

N/A
N/A
Protected

Academic year: 2021

Share "在無串擾位元變化編碼之指令匯流排中利用暫存器重新標記以減少匯流排的耗電"

Copied!
77
0
0

加載中.... (立即查看全文)

全文

(1)

資訊科學與工程研究所

在無串擾位元變化編碼之指令匯流排中利用暫存器

重新標記以減少匯流排的耗電

Power Reduction by Register Relabeling for

Crosstalk-Toggling-Free-Coded Instruction Bus

研 究 生:林均翰

指導教授:單智君 教授

中 華

華 民

民 國

國 九

九 十

十 九

九 年

年 十

十一

一 月

(2)

在無串擾位元變化編碼之指令匯流排中利用暫存器重新標記以減

少匯流排的耗電

Power Reduction by Register Relabeling for

Crosstalk-Toggling-Free-Coded Instruction Bus

研 究 生:林均翰 Student:Chun-Han Lin

指導教授:單智君 Advisor:Jean Jyh-Jiun Shann

國 立 交 通 大 學

資 訊 科 學 與 工 程 研 究 所

碩 士 論 文

A Thesis

Submitted to Institute of Computer Science and Engineering College of Computer Science

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master

in

Computer Science and Engineering

Novermember 2010

Hsinchu, Taiwan, Republic of China

(3)

在無串

無串

無串

無串擾

擾位元變化之

位元變化之指令匯流排

位元變化之

位元變化之

指令匯流排

指令匯流排

指令匯流排的

的耗電

耗電中

耗電

耗電

中利用暫存器重新標記

利用暫存器重新標記

利用暫存器重新標記

利用暫存器重新標記以

以減少

減少

減少匯流

減少

匯流

匯流

匯流

排的耗電

排的耗電

排的耗電

排的耗電

學生:林均翰 指導教授:單智君 博士 國立交通大學資訊科學與工程研究所碩士班

摘要

摘要

摘要

摘要

隨著製程進步至深亞微米級 (deep submicron level),crosstalk 在深亞微米級製程之

匯流排中的影響越受重視。而當兩條鄰近匯流線上的訊號轉換方向相反時,稱之為

crosstalk-toggling transitions,所帶來的影響不只是更多的耗電,也帶來更長的傳送延遲。

現有相當多的研究是在同步電路設計 (synchronous circuit designs) 中,利用編碼方式使

匯流排上能夠完全排除串擾位元變化轉換來達到減少傳送延遲。對於其耗電則仍保有進 一步降低的機會。 這篇論文主要是研究在無串擾位元變化的 Selective Shielding 匯流排編碼方法下,利 用暫存器重新標記 (register relabeling)進一步降低指令匯流排之耗電。在不需要增加額 外硬體需求及沒有效能損失的前提下,我們的設計是未編碼指令匯流排平均耗電的 95.3%。此外,與 Selective Shielding 方法相較,在指令匯流排上可進一步減少 12.1%的 耗電。整體而言,我們的設計可保持原 Selective Shielding 方法無串音位元變化之特性並 減少其耗電。

(4)

Power Reduction by Register Relabeling for

Crosstalk-Toggling-Free-Coded Instruction Bus

Student:Chun-Han Lin Advisor:Jean Jyh-Jiun Shann

Institute of Computer Science and Engineering National Chiao Tung University

Abstract

With process technology scale down to the deep submicron level, crosstalk effects are increasingly important considerations especially when adjacent bus lines switch in opposite directions (so called crosstalk-toggling transitions) on deep-submicron buses. Crosstalk-toggling transitions increase not only power consumption but also data transmission delays. While many bus encoding schemes have been proposed to totally avoid crosstalk-toggling transitions thus reducing data transmission delays in synchronous circuit designs, opportunities still exist to additionally reduce power consumption.

Therefore, we propose a register relabeling algorithm to further reduce the instruction bus power consumption based on the existing Selective Shielding bus encoding scheme which guarantees the encoded bus being crosstalk-toggling free. With no extra hardware requirements and performance loss, the average energy consumption of our design is 95.3% compared with an un-encoded instruction bus using 90nm technology with a 14mm bus length, and is 12.1% less than that of an instruction bus with Selective Shielding coding. In summary, our scheme preserves the crosstalk-toggling free characteristic of the Selective Shielding method and saves more energy.

(5)

致謝

致謝

致謝

致謝

由衷感謝我的指導老師 單智君教授,在老師細心的教誨與指導之下,使我學習到 對於任何事物皆可進一步探索其原理、發現問題、並解決問題,進而培養其研究的能力。 同時感謝實驗室的另外一位大家長 鍾崇斌教授,多次提出寶貴的建議與意見,使我受 益良多。 除此之外,感謝實驗室的博士班翁綜禧學長,不論在任何方面對我提出問題並適時 的給予建議;即使在我人生中的低潮,學長依然鼓勵並勉勵我繼續加油。還有,感謝實 驗室的學長姐、同學及學弟妹,感謝你們,不僅是研究的好夥伴,更是一生的好朋友。 最後,感謝我的家人,對於我的栽培不餘遺力,並再次對於所有幫助過我的人,獻 上最真誠的感謝。 林均翰 2010.11

(6)

Table of Contents

摘要 ... i Abstract ... ii 致謝 ... iii Table of Contents ... iv List of Figures ... vi Chapter 1 Introduction ... 1

1.1 Importance of Low Power Design ... 1

1.2 Sources of Power Consumptions on Buses ... 1

1.3 Effects of Crosstalk-Toggling transitions ... 3

1.4 Research Motivation ... 3

1.5 Research Objective and Approaches ... 4

1.6 Organization of This Thesis ... 5

Chapter 2 Background and Related Work ... 6

2.1 Analytical Model of Power Consumption ... 6

2.2 Previous Crosstalk-Toggling-Free Methods for Buses ... 8

2.2.1 Simple Shielding Technique ... 8

2.2.2 Victor’s Method ... 9

2.2.3 Fibonacci Coding ... 9

2.2.4 Selective Shielding Technique ... 10

2.2.5 Comparison of Different Approaches ... 11

2.3 Previous Researches ... 11

2.3.1 Selective Shielding Crosstalk-Toggling-Free Technique ... 12

(7)

2.3.3 Register Relabeling Power Saving Technique ... 19

2.3.4 Summary of Previous Researches ... 22

Chapter 3 Proposed Design ... 24

3.1 System Overview ... 24

3.2 Observations ... 25

3.3 Instruction Partition ... 28

3.4 Modified Register Relabeling ... 30

3.4.1 Register Relabeling Method 1 ... 34

3.4.2 Register Relabeling Method 2 ... 35

Chapter 4 Simulation and Analysis ... 42

4.1 Experimental Benchmarks ... 42

4.2 Experimental Methods ... 43

4.2.1 Environment ... 43

4.2.2 Experimental Method ... 44

4.2.3 Simulated Methods ... 46

4.3 Simulation Results and Analysis ... 47

4.3.1 Hardware Overhead Analysis ... 48

4.3.2 Energy Consumption of Different Techniques ... 48

4.3.3 Effects of Register Occurrence Frequencies ... 55

Chapter 5 Conclusion and Future Work ... 64

(8)

List of Figures

Figure 1-1:Self and coupling-capacitance for buses ... 2

Figure 1-2:Examples of a crosstalk 1-bit transition and a crosstalk-toggling transition ... 3

Figure 2-1:The examples of Fibonacci encoding from f1 to f4... 10

Figure 2-2:Results of TS encoding scheme for Bust = Bust-1 ⊕ Datat ... 13

Figure 2-3:System overview of SS ... 13

Figure 2-4:SS encoding/decoding algorithm ... 15

Figure 2-5:SS encoding/decoding examples ... 15

Figure 2-6:System overview of 4-to-6 SS ... 16

Figure 2-7:The 2-bit data and the relative 3-bit SS code-words ... 16

Figure 2-8:An example of the worst case of bit-swap process ... 17

Figure 2-9:An example of the bit-swap process inter adjacent code-words ... 18

Figure 2-10:4-to-6 SS encoding/decoding examples ... 19

Figure 2-11:An example of code fragment ... 20

Figure 2-12:(a) Example frequency distribution of register pairs (b) RHG from (a) ... 21

Figure 2-13:(a) Register relabeling algorithm (b) RHG after register relabeling ... 22

Figure 3-1:Overview of system ... 24

Figure 3-2:Flowchart of 4-to-6 SS data processing ... 26

Figure 3-3:An example of 4-to-6 SS coding... 27

Figure 3-4:(a) MIPS instruction formats (b) Partition of register fields and bits on the same positions... 28

Figure 3-5:An example of classification of register ... 31

Figure 3-6:Flowchart of our modified regsiter relabeling algorithm... 34

(9)

Figure 3-8:An example of register relabeling method 2 ... 41

Figure 4-1:Experimental flow... 46

Figure 4-2:The number of bit transitions of different types and techniques (fft) ... 49

Figure 4-3:The number of bit transitions of different types and techniques (sor) ... 50

Figure 4-4:The number of bit transitions of different types and techniques (lu) ... 50

Figure 4-5:The number of bit transitions of different types and techniques (ej) ... 51

Figure 4-6:The number of bit transitions of different types and techniques (mmul) ... 51

Figure 4-7:The number of bit transitions of different types and techniques (tri) ... 52

Figure 4-8:The total number of bit transitions of different types and techniques ... 52

Figure 4-9:Energy consumption of different techniques ... 54

Figure 4-10:Average energy consumption of different techniques with each portion of energy consumption of transition and/or hardware ... 55

Figure 4-11:(a) Register occurrence frequency (b) Rank order of register occurrence frequency with both register relabeling methods (ej) ... 57

Figure 4-12:(a) Register occurrence frequency (b) Rank order of register occurrence frequency with both register relabeling methods (lu) ... 58

Figure 4-13:(a) Register occurrence frequency (b) Rank order of register occurrence frequency with both register relabeling methods (fft) ... 59

Figure 4-14: (a) Register occurrence frequency (b) Rank order of register occurrence frequency with both register relabeling methods (sor) ... 60

Figure 4-15: (a) Register occurrence frequency (b) Rank order of register occurrence frequency with both register relabeling methods (tri) ... 61

Figure 4-16:(a) Register occurrence frequency (b) Rank order of register occurrence frequency with both register relabeling methods (mmul) ... 62

Figure 4-17:(a) Register occurrence frequency (b) Rank order of register occurrence frequency with both register relabeling methods (Average) ... 63

(10)

List of Tables

Table 2-1:Fibonacci encoding algorithm ... 10

Table 2-2:Comparison of different approaches ... 11

Table 2-3:The 4-bit data and relative 6-bit code-words ... 18

Table 3-1:4-bit data sorted by the number of 0s and their corresponding 4-to-6 SS code-words ... 29

Table 3-2:Relabeling register selection sequence ... 31

Table 3-3:MIPS registers categorization ... 32

Table 3-4:Relabeling register selection sequence for MIPS ISA ... 33

Table 3-5:An example of register relabeling method 1 ... 35

Table 3-6:MIPS relabelabel registers with different relabeling scope ... 38

Table 4-1:Benchmark programs ... 43

(11)

Chapter 1 Introduction

In this chapter, we first introduce the importance low power design, and, then, discuss

the sources of power consumption on bus and the effects of crosstalk-toggling transitions. The

research motivation and objective are then introduced. The organization of this thesis is

elaborated in the end.

1.1

Importance of Low Power Design

As the complexity of system-on-chip (SoC) design increases, power consumption is

becoming one of the most important design issues especially for embedded systems due to

heat reduction, cooling cost reduction, longer cell life, and etc. In addition to these problems,

energy efficiency has become an important characteristic of product quality. In mobile devices

such as cellular phone and other handheld devices, energy efficiency further determines the

usability and acceptance of these products. Since these products are battery-powered and the

required usage amounts are increasing rapidly, low power design for these systems becomes a

very important research topic.

1.2

Sources of Power Consumptions on Buses

The power consumption of bit transitions on bus lines is one of the major sources to the

(12)

charging and discharging the capacitance for data transmission. The bit transitions can be

classified into self-transition and coupling-transition of capacitances [1]. As CMOS processes

scale down to the deep submicron level, both self-capacitance and coupling-capacitance needs

to be taken into account. Capacitance between a bus line and ground is called self-capacitance

(Cs), and capacitance between adjacent bus lines is called coupling-capacitance (Cc). Both

capacitances are shown in Figure 1-1.

C

s

C

s

C

s

C

c

C

c

C

s

=Self-capacitance

C

c

=Coupling-capacitance

Bus lines

Figure 1-1:Self and coupling-capacitance for buses

Self-transitions are bit transitions on each individual bus line which make

self-capacitance charging and discharging. Coupling-transitions are bit transitions between

adjacent bus lines that cause a voltage level difference and thus cause coupling-capacitance

charging and discharging. Coupling-transitions can be subdivided into two types, crosstalk

1-bit transitions and crosstalk-toggling transitions. Moreover, crosstalk 1-bit transitions occur

in the cases when only one of the bus lines switches between adjacent bus lines, for examples

{00  01}, {00  10}, {11  01}, {11  10}. Crosstalk-toggling transitions occur when

(13)

the remaining cases, for examples {00  00}, {11  11}, {00  11}, do not trigger any

activity on coupling-capacitance. Figure 1-2 shows the examples of a crosstalk 1-bit transition

and a crosstalk-toggling transition.

0→0 0→1 Crosstalk 1-bit-transitions {00 → 01} 0→1 1→0 Crosstalk-toggling transitions {01 → 10}

Figure 1-2:Examples of a crosstalk 1-bit transition and a crosstalk-toggling transition

1.3

Effects of Crosstalk-Toggling transitions

With process technology moving toward the deep submicron level, coupling-capacitance

between adjacent bus lines is becoming ever more prominent. The ratio of

coupling-capacitance to self-capacitance increases as process shrinks [2]. Crosstalk-toggling

transitions cause not only more power consumption but also longer data transmission delays.

The data transmission delay from crosstalk-toggling transition is at least twice of that of other

transitions [3]. As regards power consumption, the power consumption due to

crosstalk-toggling transitions is at least four times of that of other transitions [1]. Thus, the

effects of crosstalk-toggling transitions are much more serious than that of other transitions.

1.4

Research Motivation

(14)

others, many bus-encoding schemes have been proposed to totally avoid the

crosstalk-toggling transitions. The purpose of crosstalk-toggling-free bus encoding schemes is

to reduce data transmission delay in synchronous circuit designs. However, opportunities still

exist in previous crosstalk-toggling-free bus encoding schemes to reduce total power

consumption on buses at the same time with crosstalk-toggling free.

The power consumption on instruction bus constitutes great portion of total power

consumption on buses since a processor typically accesses instructions every instruction cycle

and the bit patterns of instruction bus is less regular than that of its address bus. However,

instructions are compiled at static time. There are opportunities to deal with instructions in a

post-compilation phase. For example, a typical ISA exhibits regularity that the register fields

are in fixed positions within the instruction encoding, and the register fields constitute a

significant part of an instruction word. Choosing registers appropriately may reduce the power

consumption of instruction bus [4]. Therefore, it is possible to reduce the power consumption

by generating instructions which consume less power.

1.5

Research Objective and Approaches

In this thesis, instructions are handled at static time to further reduce power consumption

for a crosstalk-toggling-free-coded instruction bus with no extra hardware and performance

loss. This goal is achieved by exploiting the characteristics of code-words on

(15)

crosstalk-toggling-free encoded instruction bus that depend only on the number of 1s of

code-words. Thus, the instructions which have less 1s after crosstalk-toggling-free bus

encoding are generated. Moreover, register relabeling is used for relabel registers of

instructions, and our modified register relabeling method can consider only the register

number itself. Furthermore, the relabeling scope may be a smaller one that provides more

opportunities to reuse register numbers with less 1s after crosstalk-toggling-free bus encoding

resulting in fewer transitions. Consequently, our approaches will be suitable for

crosstalk-toggling-free-coded instruction bus so as to reduce the bit transitions on instruction

bus for power reduction.

1.6

Organization of This Thesis

The remaining chapters of this thesis are organized as follow. Chapter 2 introduces the

source of power consumption and analytical model of delay and discusses previous related

researches on crosstalk-toggling free and power reduction techniques for instruction bus. In

Chapter 3, we illustrate our power reduction techniques for instruction bus. The experimental

environment, simulation results and relative analysis are presented in Chapter 4. Finally, we

(16)

Chapter 2 Background and Related Work

The main purpose of this chapter is to provide the necessary background for the concepts

and methods presented in the following chapters. First, we will introduce the analytical model

of power consumption for deep submicron buses. Then, a survey of the related approaches for

crosstalk-toggling free bus encoding scheme and bus power reduction will be presented.

2.1

Analytical Model of Power Consumption

There are three major sources of power consumption in digital CMOS circuits [5]. The

first one is the switching power for charging and discharging the circuit node capacitances.

The second one is short-circuit power due to the direct-path short circuit current arises when

both the NMOS and PMOS transistors are active simultaneously, and then conduct current

directly from supply to ground. Finally, leakage power, which can arise from substrate

injection and sub-threshold effects, is primarily determined by fabrication technology

considerations while we will not discuss it [6].

In this thesis, we focus on reducing the switching power for charging and discharging the

(17)

[1] c s switching

P

P

P

=

+

2 2 2

4

C dd dd C dd S

V

XTTr

C

V

XTTg

C

V

C

ST

+

+

=

2

)

4

(

ST

+

XTTr

+

XTTg

C

S

V

dd

=

λ

λ

, (1)

,where the first term, Ps, represents the switching power of self-transitions; the second term, c

P , represents the switching power of coupling-transitions that included crosstalk 1-bit transitions and crosstalk-toggling transitions where CS is the self-capacitance, CC is the

coupling-capacitance, Vdd is the supply voltage, λ is equal to CC / CS, ST is the total

number of self-transitions, XTTr is the total number of crosstalk 1-bit transitions, and

XTTg is the total number of crosstalk-toggling transitions.

Low power design is to minimize transitions, capacitances, and Vdd. Once the technology

process has been chosen, capacitances will be decided. From the power equation, decreasing

the Vdd factor can be an effective way for power dissipation of switch power. However, the

supply voltage is usually determined by the system and technology consideration, and

decreasing Vdd will increase the propagation delay consequently. Finally, the remaining

important factor is the transitions. Reducing the number of bit transitions per transaction may

reduce the number of capacitances needed to be driven. Bus encoding is a well-known

(18)

equation of power dissipation cost (PDC) function can be defined as follows:

XTTg

XTTr

ST

PDC

=

+

λ

+

4

λ

(2)

In this thesis, minimizing the PDC function is the goal of our proposed methods. For

crosstalk-toggling-free bus encoding scheme, XTTg is guaranteed to be “0”, and, thus, our

objective is to reduce ST and XTTr as many as possible for power reduction.

2.2

Previous Crosstalk-Toggling-Free Methods for Buses

Many methods have been proposed to totally avoid crosstalk-toggling transitions to

reduce transition delays in synchronous circuit design. We briefly describe some of these

techniques and discuss the reason why we focus on one of them.

2.2.1

Simple Shielding Technique

The simplest method to avoid crosstalk-toggling transitions is the simple shielding

technique, where a shield line is inserted between every pair of adjacent bus lines [7]. The

shield lines have no signal transitions, and, thus, crosstalk-toggling transitions are avoided.

No encoder and decoder are required, but n-1 extra bus lines are needed for n-bit bus, and,

thus, double the area used by the bus. When the bus is routed using scare top-level metal

(19)

2.2.2

Victor’s Method

From the concept of simple shielding technique, Victor’s method provides a theoretical

framework to generate crosstalk-toggling free code-words [8]. As compared to 2n-1 bus lines

required by the simple shielding technique, Victor’s method proofs that the lower limit on the

number of required bus lines is log2(fm) for a n-bit bus, where fm is the mth Fibonacci

number, for example, it requires 46 bus lines for 32-bit bus. However, it is hard to generate

the crosstalk-toggling-free code-words due to the lack of generalized procedure to generate

the code-words.

2.2.3

Fibonacci Coding

Fibonacci coding scheme is based on the theoretical framework of Victor’s method and

gives a recursive procedure to generate code-words [9]. The same number of bus lines,

log2(fm), is required as Victor’s method for n-bit bus.

Table 2-1 shows the encoding algorithm of Fibonacci coding, and Figure 2-1 shows the

examples from f1 to f4. In Fibonacci coding, let amam-1 ··· a2a1 is an m-bit crosstalk-toggling-free Fibonacci code-word. The decimal value will be am × Fib(m) + am-1

× Fib(m-1) + … + a2 × Fib(2) + a1 × Fib(1), where Fib(i), 1 ≤ i ≤ m, is the ith

Fibonacci number.

(20)

the larger data width, the higher gate delay of the corresponding encoder and decoder will be.

Table 2-1:Fibonacci encoding algorithm

if m is odd else

}

1

,

0

{

1

=

f

} 1 | 11 { } | 0 { 1 m m m x x F y y f f = ∀ ∈ ∪ ∀ ∈ + } | 1 { } 0 | 00 { 1 m m m x x f y y f f = ∀ ∈ ∪ ∀ ∈ + f1 1 f2 1 1 f3 2 1 1 f4 3 2 1 1 0 1 0 0 0 1 1 1 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 1 1

Figure 2-1:The examples of Fibonacci encoding from f1 to f4

2.2.4

Selective Shielding Technique

The concept of Selective Shielding comes also from the simple shielding technique. It is

shown that if the code-words may avoid adjacent 1s data pattern, the crosstalk-toggling

transitions may be avoided after Transition Signaling encoding scheme. Thus, Selective

Shielding guarantees that there is no adjacent 1s in coed-words, and the required bus lines are

3n/2 for n-bit bus.

(21)

4-to-6 Selective Shielding is also to avoid adjacent 1s in coed-words, and 3n/2 bus lines are

required as well for n-bit bus. The difference between Selective Shielding and 4-to-6

Selective Shielding is that 4-to-6 Selective Shielding partitions data into several fields and

may encode all fields at the same time. Hence, the gate delay of encoder and decoder of 4-to-6

Selective Shielding is much less than that of Selective Shielding.

2.2.5

Comparison of Different Approaches

Table 2-2 shows the comparison of these above approaches. Considering the required

bus lines and the coding delay of encoder and decoder, we choose 4-to-6 Selective Shielding

in our method. Therefore, the detail description of Selective Shielding and the extension,

4-to-6 Selective Shielding, will be introduced in the next section.

Table 2-2:Comparison of different approaches

Approaches (for n-bit bus)

# of bus lines required Delay of Encoder Delay of Decoder Simple Shielding 2n-1 ─ ─

Victor’s Method ≒ (<) 3n/2 N/A N/A

Fibonacci Coding ≒ (<) 3n/2 High High

Selective Shielding 3n/2 Medium Medium

4-to-6 SS 3n/2 Low Low

2.3

Previous Researches

(22)

Shielding (SS) encoding scheme [10][11] and 4-to-6 Selective Shielding (4-to-6 SS) encoding

scheme [10], and one previous research for reducing the switching activities on buses, register

relabeling [4], are introduced in the following subsections.

2.3.1

Selective Shielding Crosstalk-Toggling-Free Technique

The goal of Selective Shielding (SS) encoding scheme is to avoid crosstalk-toggling

transitions on bus by using n/2 extra bus lines for n-bit bus [10][11]. The basic idea of SS

comes from Transition Signaling (TS) encoding scheme [12]. The encoding of TS encoding

scheme is to perform an XOR operation on the n-bit previous bus value (Bust-1) with the n-bit

current data (Datat) and transmit the result as the n-bit current bus value (Bust), that is, Bust =

Bust-1 ⊕ Datat. In decoding, get the current data (Datat) by performing XOR operation on

the previous bus value (Bust-1) with the current bus value (Bust), i.e., Datat = Bust-1 ⊕ Bust.

It is observed that only when the current data has adjacent 1s data pattern, a crosstalk-toggling

transition may be generated on a TS encoded bus. The results of TS coding for all

combinations of Bust-1 and Datat are shown in Figure 2-2. Since crosstalk-toggling transitions

occurred on adjacent bus lines, Figure 2-2 shoes only 2 bits for each of the previous bus value

(Bust-1), the current bus value (Bust), and the current data (Datat). All the possible

combination of the 2-bit Bust-1 are shown on the rows, all the possible combination of the

2-bit Datat are shown on the columns, and the results of the 2-bit Bust (= Bust-1 ⊕ Datat.) are

(23)

previous bus value and the current bus value may cause a crosstalk-toggling transition.

Bust-1 : previous bus value

Datat: current data

Bust: current bus value

00

01

10

11

01

00

11

10

10

11

00

01

11

10

01

00

00

01

10

11

00

01 10 11

Bus

t-1

Data

t

Bus

t

Figure 2-2:Results of TS encoding scheme for Bust = Bust-1 ⊕ Datat

Therefore, if the current data do not have adjacent 1s, crosstalk-toggling transitions may

be avoided on TS encoded bus. According to this basic idea, the design of SS is to make sure

that there are no adjacent 1s in the code-words to make the crosstalk-toggling free after TS

coding. Figure 2-3 shows the system overview of SS.

SS Encoding TS Encoding TS Decoding SS Decoding

n 3n/2

bus

3n/2 3n/2 n

Figure 2-3:System overview of SS

The method of avoiding adjacent 1s in data is to encode each “1” to “10” rather than

simple shielding technique [7] which inserts a shield bit (assume “0”) between adjacent data

bits. While the data bits are all 1s, the SS encoding method will be the same as the simple

(24)

1s present in the data. In order to reduce the number of 1s in data to limit the number of added

0s, SS technique calculates the number of 1s in data first. If the number of 1s in the n-bit data

is less than n/2, encode each “1” to “10” directly. Otherwise, invert the data first to make the

number of 1s in the n-bit data less than n/2, and then encode each “1” to “10” of the converted

data. After that, append an invert bit “1” at the LSB of the code-word to denote that the data

have been inverted.

Note that it needs at most n/2 extra bus lines to encode an n-bit bus. In order to providing

fixed length (3n/2-bit) of code-words, append “0s” at MSB positions if the length of a

code-word is less than 3n/2.

In decoding, check the LSB of the code-word first. If LSB is “0”, convert each “10” to

“1” directly. Otherwise, it means that the data have been inverted in encoding process. Thus,

cut the end-bit first and covert each “10” to “1”, and then invert the converted data. After

above decoding process, remove the leading bits that exceed the original data length (n).

The encoding and decoding algorithms of SS technique are shown in Figure 2-4, and

(25)

If # of 1s in the n-bit data ≤ n/2 Each “1” is encoded as “10”. Else

Invert the data first.

Each “1” is encoded as “10”. Append an invert bit, “1” , at LSB.

Append 0s at MSB to provide a 3n/2-bit code-word. (a) Encoding algorithm

If the end-bit of a code-word == 1 Cut the end-bit, covert “10” to “1.” Invert the converted code-word. Else

Convert “10” to “1.”

Remove leading bits that exceed the original data length(n) (b) Decoding algorithm

Figure 2-4:SS encoding/decoding algorithm

Data Invert? “1”to “10” If inverted, append 1

Append 0s for fixed length (3n/2)

Code-word

If inverted, cut the end-bit “10”to “1” Inverted? Remove the leading bit 0001 0001 X 00 0100 0010 X 0000 0100 0010 1101 1000 X 10100101 0000 X X 1100 1110 0011 0001 001 0100 0010 001 0100 0010 1 X 1011 1111 0100 0000 0 1000 0000 0 1000 0000 1 000 1000 0000 1 0000 0100 0010 X 00 00010001 X 0001 0001 1010 0101 0000 X 1101 1000 X X 0010 1000 0101 0010 1000 010 00110001 1100 1110 X 0001 0000 0001 0001 0000 000 00 0100 0000 11 1011 1111 1011 1111

Encoding

Decoding

Figure 2-5:SS encoding/decoding examples

2.3.2

4-to-6 Selective Shielding Crosstalk-Toggling-Free Technique

(26)

hardware is complex and time consuming. If data are partitioned into several fields and each

field is encoded individually at the same time, its corresponding hardware may be simpler and

may save more transition time than SS encoding scheme. 4-to-6 selective shielding (4-to-6 SS)

was proposed for these proposes. It is an extension of SS encoding scheme with smaller

encoding unit and needs also n/2 extra bus for n-bit bus [10]. Figure 2-6 shows the system

overview of 4-to-6 SS.

4-to-6 SS

Encoding TS Encoding TS Decoding

4-to-6 SS Decoding

n 3n/2

bus

3n/2 3n/2 n

Figure 2-6:System overview of 4-to-6 SS

The 4-to-6 SS encoding scheme first apply the smallest possible encoding unit of SS to

encode 2-bit data into 3-bit code-words to simplify the corresponding hardware. Figure 2-7

shows the 2-bit data and the corresponding 3-bit code-words. In the other words, it applies SS

technique to n/2 2-bit sub-data in parallel to generate n/2 3-bit code-words.

00

01

10

11

000

010

100

001

data Code-words

Figure 2-7:The 2-bit data and the relative 3-bit SS code-words

However, when data exist 11 patterns which are followed by 10 patterns, there have

adjacent 1s between code-words of adjacent partitions, that is, 11 10 are encoded into 001 100.

(27)

swaps one of the adjacent 1s with other bit to avoid the two adjacent 1s. Assuming that the ith

and the (i-1)th bits are the two adjacent 1s, swap the ith and the (i-3)th bits. After swapping

the ith and the (i-3)th bit positions, if the (i-4)th bit is a “1”, repeat the swapping process until

there is no adjacent 1s between code-words. In the worst case, it may require n/2 − 1

bit-swaps and thus increases the coding delay. Figure 2-8 gives an example of the bit-swap

process. From Figure 2-8, when data exist 11 patterns followed by m 10 patterns, m swaps

will occur.

11 10 10 10 ‧ ‧ ‧

10 10 → 001 100 100 100 ‧ ‧ ‧

100 100

000 101 100 100 ‧ ‧ ‧

100 100

000 100 ‧ ‧ ‧100 101 100 100

n

-bit data

3n/2-bit code-word

.

.

Worst-case :

n/2 −

− 1 swaps

000 100 101 100 ‧ ‧ ‧

100 100

000 100 ‧ ‧ ‧100 100 101 100

000 100 ‧ ‧ ‧100 100 100 101

Figure 2-8:An example of the worst case of bit-swap process

In order to reducing the number of swaps from n/2 − 1 to 1, 4-to-6 SS consider partition

data into several fields with size 4, then partition each 4-bit field into two 2-bit sub-partitions,

and then apply SS to two 2-bit sub-partitions to generate a 6-bit code-word. The bit-swap

process is the same as that mentioned before, i.e., swap the ith and the (i-3)th bits if the ith

and the (i-1)th bits are both 1s. Under the encoding method, it is apparent that 1110 is encoded

into 001100 which has adjacent 1s in its code-word and thus bit swapping intra cod-word is

(28)

swaps inter code-words if the right-hand-side code-word ends with 1 and the same bit-swap

process is performed, and thus 1010 is encoded into 010101 to avoid repetitious swaps. Table

2-3 shows the 4-bit data and the corresponding 6-bit code-words. However, when the

code-words of left-hand side end with 1 and the code-words of right-hand-side start with 1,

the adjacent 1s inter code-words happen. Note that code-words of right-hand-side start with 1,

and the following 2nd, 3rd and 4th bits are 0s. Therefore, once it encounters adjacent 1s inter

6-bit code-words, it needs only one bit-swap process to avoid adjacent 1s. Figure 2-9 shows

an example of the bit-swap process.

Table 2-3:The 4-bit data and relative 6-bit code-words

4-bit data 4-to-6 SS

code-word 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 000000 000010 000100 000001 010000 010010 010100 010001 100000 100010 010101 100001 001000 001010 000101 001001 ‧‧‧1 1 0 0 0 * * i i-1 i-3 ‧‧‧0 1 0 1 0 * *

i-2 i i-1 i-2 i-3

(29)

In 4-to-6 SS decoding, if the 6-bit code-words has “1010**” data pattern, it means that

the bit-swap process has been applied to the code-words to avoid adjacent 1s between 6-bit

code-words. Thus, it needs to swap back first, and then process 4-to-6 SS decoding. Figure

2-10 shows examples of 4-to-6 SS encoding and decoding.

Data 0010 1010 4-to-6 SS code-word 000100 010101 If adjacent 1s, swap X 4-to-6 SS code-word 000100 010101 If 1010**, swap back X Data 0010 1010

Encoding

Decoding

0011 1011 000001 100001 000000 101001 000000 101001 000001 100001 0011 1011 Figure 2-10:4-to-6 SS encoding/decoding examples

2.3.3

Register Relabeling Power Saving Technique

In a typical RISC ISA, register fields are fixed within the instructions and occupy large

portion in the instruction encoding. If the number of bit transitions in two register numbers in

the same bit positions of two consecutive instructions is higher and the combination of the

two register numbers often appears in the any two consecutive instructions, the power

consumption will be larger. However, the registers of a typical RISC ISA are general purpose,

(30)

minimize the bit transitions of register fields during instruction fetches by relabeling register

numbers statically [4]. Figure 2-11 shows an example of code fragment. It could achieve

reduction in bit transition with no performance penalties.

add add add add r3r3r3r3, r2, r4, r2, r4, r2, r4, r2, r4 sub sub sub sub r6r6r6r6, , , , r3r3r3r3, r5, r5, r5, r5 sub sub sub sub r3r3r3r3, r2, , r2, , r2, , r2, r6r6r6r6 mul r4, r4, r5 mul r4, r4, r5mul r4, r4, r5 mul r4, r4, r5 4 5 7 add add add add r6r6r6r6, r2, r4, r2, r4, r2, r4, r2, r4 sub sub sub sub r7r7r7, r7, , , r6r6r6r6, r5, r5, r5, r5 sub sub sub sub r6r6r6r6, r2, , r2, , r2, , r2, r7r7r7r7 mul r4, r4, r5 mul r4, r4, r5 mul r4, r4, r5 mul r4, r4, r5 3 3 4 r3 r3r3 r3→→→r6→r6r6r6 r6 r6r6 r6→→→r7→r7r7r7 +) (+ 16 10 Bit transitions on Register fields Bit transitions on Register fields

Figure 2-11:An example of code fragment

The first step of relabeling is that constructed a graph called the “Register Histogram

Graph” (RHG). The RHG captures the occurrence frequency and relationship between

register pairs which are two register numbers in the same bit positions of two consecutive

instructions. Each RHG node represents a register. Each RHG edge represents that two

register numbers compose a register pair, and the weight of each edge annotates with the

frequency of register pairs. Figure 2-12 (a) shows an example of all pairs of registers appeared

in a code fragment and the frequency of each pair. In Figure 2-12, assume that the architecture

uses registers from register $1 to register $8. Figure 2-12 (b) is a RHG derived from Figure

(31)

Reg Pair frequency

(r7 , r8)

(r4 , r7)

(r1 , r6)

(r1 , r7)

(r1 , r8)

(r3 , r4)

(r4 , r8)

(r6 , r7)

(r7 , r7)

3

2

1

1

1

1

1

1

1

1

r4

r3

r8

r7

r6

r1

1

1

2

3

1

1

1

1

RHG

Node:register name

Edge:register pair

Edge weight:frequency

Total bit transitions:29

(a) (b)

Figure 2-12:(a) Example frequency distribution of register pairs (b) RHG from (a)

The following algorithm utilizes the RHG to relabel the register numbers [13]. Figure

2-13 shows the RHG after register relabeling. In this example, start from the most frequent

edge of register pair, register $7 and register $8, relabel them into a register pair, register $1

and register $3, whose hamming distance is minimized. Then, for the second most frequent

edge of register pair, register $7 and register $4 , since the register $7 is assigned, relabel

register $4 into register $5 so that the hamming distance to its assigned neighbor registers is

minimized. The following relabeling steps are the same as above description.

Algorithm

 Iterate through the edges starting from the most frequent ones  Rename the registers yet unassigned so that hamming distance to

all their assigned neighborsin the graph is minimized

(32)

1

r5

r7

r3

r1

r4

r2

1

1

2

3

1

1

1

1

Reassigned results

Total bit transitions from 29 to 15

r7 → r1

r8 → r3

r4 → r5

r1 → r2

r6 → r4

r3 → r7

(b)

Figure 2-13:(a) Register relabeling algorithm (b) RHG after register relabeling

2.3.4

Summary of Previous Researches

The SS and 4-to-6 SS are all crosstalk-toggling free bus encoding schemes and both need

n/2 extra bus for n-bit bus. Since SS encodes the whole n-bit data at a time, its corresponding hardware is more complex and time consuming than that of 4-to-6 SS which partitions data

into several fields and encodes all fields at the same time. We focus on 4-to-6 SS since its

corresponding hardware is less time consuming and the partitioning method can be further

complemented by our modified relabeling method.

Register relabeling may reduce bit transitions of register fields on a traditional

instruction bus. It needs to consider the relationship between register fields of consecutive

instructions. 4-to-6 SS encoding scheme brings a different situation for register relabeling

(33)

Due to the characteristics of TS encoding scheme, if data hold fewer 1s in code-words,

the number of bit transitions on a TS encoded bus is lower [12]. Therefore, we may make use

of the characteristics to modify register relabeling for 4-to-6 SS to produce code-words with

fewer 1s to reduce the number of bit transitions on a TS encoded bus. The detail description

(34)

Chapter 3 Proposed Design

This chapter will introduce our design of modified register relabeling to reduce the

number of bit transitions on instruction bus. The overview of proposed design will be shown

in Section 3.1. The observations of our design foundation will be presented in Section 3.2.

The remaining sections will show the details of our design.

3.1

System Overview

Modified Register Relabeling Instruction Reversion 4-to-6 SS Decoding TS Decoding Program Binary Relabeled Program Binary Instruction Memory CPU Instruction Bus

Static Time

Dynamic Time

Instruction Partition 4-to-6 SS Encoding

TS Encoding

Figure 3-1:Overview of system

The system contains static-time phase and dynamic-time phase. Figure 3-1 illustrates the

system overview.

Our method concentrates mainly on the register fields of instructions. Our modified

(35)

for 4-to-6 SS encoding scheme to produce relabeled program binary that resides in the

instruction memory at static time. At dynamic time, instruction will be partitioned in

Instruction Partition step after fetching from instruction memory in order to combine 4-to-6

SS encoding scheme with our modified register relabeling. After that, the coding process

including data coding (4-to-6 SS Encoding/Decoding) and data transmitting (Transition

Signaling Encoding/Decoding) through the instruction bus is exactly the same as that of the

original 4-to-6 SS encoding scheme. The Instruction Reversion is the reverse of the

Instruction Partition step.

3.2

Observations

As described in subsection 2.3.2, 4-to-6 SS encoding scheme brings a different situation

for register relabeling, so that it is necessary to consider the impact. In a 4-to-6 SS encoded

instruction bus, the current data is converted to 4-to-6 code-word without adjacent 1s, and

then an XOR operation is performed between the previous bus value and the 4-to-6 SS

(36)

4-to-6 SS encoding

TS encoding

Current data

4-to-6 SS

code-word

Current bus value

(Previous bus value)

Figure 3-2:Flowchart of 4-to-6 SS data processing

Due to the characteristics of TS encoding scheme, if the inputs are a previous result and

“0”, the current result will be equal to the previous result; if the inputs are a previous result

and “1”, the current result will be equal to the inversion of the previous result. Therefore, the

first observation is that the number of self-transitions between the previous bus vaule and the

current bus value is equal to the number of 1s in the 4-to-6 SS code-word. Moreover, once “1”

appears in a 4-to-6 SS code-word, its neighbor bits must be “0”. After TS encoding scheme,

the neighbor positions of a self-transtion must be no signal trantiions. Thus, the number of

crosstalk 1-bit transitions between the previous bus value and the current bus value is twice as

many as the number of 1s in the 4-to-6 SS code-word except the “1s” appeared in the most

significant bit (MSB) and the least significant bit (LSB) bit postitions. Since the crosstalk

(37)

has only one adjacent bus line and thus may cause one crosstalk 1-bit transition at most.

Figure 3-3 shows an example of 4-to-6 SS encoding scheme. From Figure 3-3, after XOR

operation, the number of self-transitions between the previous bus value and the current bus

value is 2 which is equal to the number of 1s, 2 “1s”, in the 4-to-6 SS code-word. Furthermore,

after XOR operation, the number of crosstalk 1-bit transitions between the previous bus value

and the current bus value is 3 which is equal to twice of the number of 1s in the 4-to-6 SS

code-word and minus the “1” appeared at LSB, 2 × 2 − 1.

(Previous bus value)

(Current bus value)

(4-to-6 SS code-word, 2 “1s”)

0

0

‧‧

0

1 0

‧‧

0 1

a

n-1

a

n-2‧‧

a

x+1

a

x

a

x-1‧‧

a

1

a

0

a

n-1

a

n-2‧‧

a

x+1

a

x

a

x-1‧‧

a

1

a

0

Figure 3-3:An example of 4-to-6 SS coding

Therefore, the power cost terms of the power dissipation cost (PDC) in Eq.(2) may be

formulated as follows:

codewords

SS

to

the

in

s

of

ST

=

#

1

4

6

(3)

)

1

(#

)

1

#

2

(

of

s

in

codewords

of

s

in

MSB

and

LSB

of

codewords

XTTr

=

×

(4)

,where ST is the total number of self-transitions and XTTr is the total number of crosstalk

1-bit transitions. As for the number of crosstalk-toggling transitions, XTTg , it is guaranteed

to be 0 after 4-to-6 SS encoding scheme. Consequently, the power consumption depends on

(38)

built up by our observations.

3.3

Instruction Partition

The purposes of designing Instruction Partition are to preserve the chance for register

relabeling on register fields and to make use of the characteristics of 4-to-6 SS encoding

scheme. Firstly, for register fields, each register field is better to be fit in one partition.

However, 4-to-6 SS encoding scheme requires 4-bit partitions, and, thus, each register field

would be partitioned into 4-bit fields and the remaining bits of register fields will be

processed with other fields. Taking the MIPS instruction set for example, its instruction

format are shown in Figure 3-4 (a) [14]. The proposed partitions for all register fields of

R-type and I-type, and bits on the same positions of I-type and J-type are shown in Figure 3-4

(b).

Figure 3-4:(a) MIPS instruction formats (b) Partition of register fields and bits on the same positions

Furthermore, characteristics of 4-to-6 SS encoding scheme should be considered for both

(39)

encoding scheme, the bits with more 0s in the same partition will have a probability of having

fewer 1s in their 4-to-6 SS code-words than that of original data, and fewer 1s in the 4-to-6 SS

code-words lead to fewer transitions from our observations in Section 3.2. Table 3-1 shows

the characteristics of 4-to-6 SS code-words. It is clear that if the 4-bit data have more 0s, the

corresponding 6-bit code-words will have more 0s, too.

Table 3-1:4-bit data sorted by the number of 0s and their corresponding 4-to-6 SS code-words

4-bit data 4-to-6 SS code-word # of 1s 0000 0001 0010 0100 1000 0011 0101 0110 1001 1010 1100 0111 1011 1101 1110 1111 000000 000010 000100 010000 100000 000001 010010 010100 100010 010101 001000 010001 100001 001010 000101 001001 0 1 1 1 1 1 2 2 2 3 1 2 2 2 2 2

Therefore, our approach is to sort the probabilities of 0s of the remaining bits of register

fields and all other fields after register relabeling at static time, and then partition them into

4-bit fields by the sorted order. In order to avoid extra hardware overheads, we apply fixed

(40)

3.4

Modified Register Relabeling

According to our observations, no matter what the previous bus values is, the power

consumption depends only on the number of 1s in the 4-to-6 SS code-words of the current

data. Therefore, rather than depending on the relation between registers as the case for

original register relabeling, the power consumption caused by the 4-to-6 encoded register

fields depends on the register numbers themselves only. The basic idea of our modified

register relabeling is to relabel more frequently occurred registers to registers that have fewer

1s after 4-to-6 SS encoding.

In addition, we may count the number of 1s in 4-to-6 SS code-word of each register in

advance to decide which register should be selected early to relabel the freuqently occurred

registers. The selection order is called the relabeling register selection sequence. This

relabeling register selection sequence is constructed in terms of the instruction partition on

register fields. For example, according to the instruction partition on register fields as shown

in Figure 3-4 (b), the leading 4 bits of each register can be classified according to the number

of 1s in its corresponding 4-to-6 SS code-word. Note that there are two registers that have the

same leading 4 bits with a different least significant bit (LSB). Figure 3-5 shows an example

of the classification of registers. In this example, register $6 and register $7 have the same

(41)

000001

4-to-6 SS code-word

of the leading 4 bits

r6:0011 0

r7:0011 1

Figure 3-5:An example of classification of register

In Table 3-2, registers are classified according to the number of 1s in the 4-to-6 SS

code-word of the leading 4 bits of its register number. The registers that have less number of

1s in their corresponding 4-to-6 SS code-words of the leading 4 bits of their register numbers

are selected for relabeling first. Then, for two registers with the same number of 1s in the

4-to-6 SS code-words of their leading 4 bits, choose the one with LSB “0” to gain a higher

probability of having less 1s in the code-word than that with LSB “1”. The last column of

Table 3-2 shows the selection order for register relabeling in descending priorities. The

registers with the same sequence number may be chosen randomly.

Table 3-2:Relabeling register selection sequence

Relabeling selection sequence 1 2 3 4 5 6 7 8 # of 1s in the 4-to-6 SS code-word of the leading 4-bit of register number

LSB Register number (in decimal)

0 0 r0 1 r1 1 0 r2 r4 r6 r8 r16 r24 1 r3 r5 r7 r9 r17 r25 2 0 r10 r12 r14 r18 r22 r26 r28 r30 1 r11 r13 r15 r19 r23 r27 r29 r31 3 0 r20 1 r21

(42)

Due to the constraints of an instruction set architecture (ISA), the registers may be

classified into non-relabelable and relabelable. For non-relabelable regsiters, these registers

sholud not be relabeled and can not be used for relabeling for the whole program.

Taking the register usage conventions of MIPS architecture for example, registers are

classified in terms of their usage purposes as shown in Table 3-3 [15]. In MIPS registers,

register $0 is non-relabelable since register $0 is hard wired to the value zero. Register $31 is

the destination register used by instructions JAL, BLTZAL, BLTZALL, BGEZAL, and

BGEZALL without being explicitly specified so that register $31 is non-relabelable, neither.

The remaining registers are relabelable. Therefore, the relabeling registers selection sequence

for MIPS ISA is shown in Table 3-4.

Table 3-3:MIPS registers categorization

Category Name Number Use

Non-relabelable

$zero $0 Always 0 $ra $31 Return address

Category Name Number Use

Relabelable

$at $1 Assembler temporary $k0 - $k1 $26 - $27 Kernel registers

$gp $28 Global pointer $sp $29 Stack pointer $v0 - $v1 $2 - $3 Return value $a0 - $a3 $4 - $7 Argument registers

$t0 - $t9 $8 - $15,

$24 - $25 Temporary registers $s0 - $s8 $16 - $23,

(43)

Table 3-4:Relabeling register selection sequence for MIPS ISA Relabeling selection sequence

1

2

3

4

5

6

7

# of 1s in the 4-to-6 SS code-word of the leading 4-bit of register number

LSB

Register number (in decimal)

0

1

r1

1

0

r2 r4 r6 r8 r16 r24

1

r3 r5 r7 r9 r17 r25

2

0

r10 r12 r14 r18 r22 r26 r28 r30

1

r11 r13 r15 r19 r23 r27 r29

r31

3

0

r20

1

r21

In this thesis, we propose two register relabeling methods. In register relabeling method

1, we gather the occurrence frequency of each relabelable register from a program trace, and

then relabel more frequently occurred registers to registers that have fewer 1s after 4-to-6 SS

encoding. In this method, each relabelable register is relabeled to a specific registers

consistently for the whole program to reduce the power consumption while preserving the

correctness of the program.

However, considering register usage convention and no performance degradation, there

are regsiters that are used independently for each procedure. These registers in different

procedures may be relabeled into the same reigster which has less 1s after 4-to-6 SS encoding

to reduce the number of 1s in 4-to-6 SS code-words for more power reduction. Thus, in

register relabeling method 2, there are regsiters which may be relabeled independently for

(44)

consistently for the whole program to keep the correct execution.

The details of these two register relabeling methods are described in the following

subsections

3.4.1

Register Relabeling Method 1

Figure 3-6 shows the flowchart of our modified regsiter relabeling method 1. The first

step of this method records the occurrence frequency of each relabelable register from a

program trace. In the next step of register relabeling method 1, sort the occurrence frequencies

of the relabelable regsiters. The final step is to relabel the registers by the sorted order

according to the relabeling register selection sequence shown in Table 3-2. In this method, a

relabelable register is relabeled to another register consistently through the porgam, that is to

say, it is a program-scoped relabelabel register.

Table 3-4 shows a relabeling example according to the selection sequence in Table 3-2.

In this example, the occurrence frequencies of relabelable registers are collected and sorted.

Then, according to the selection sequence in Table 3-2, relabel registers into registers with

less 1s after 4-to-6 SS coding.

Gather the frequencies of the relabelable registers

for the trace of a program

Sort the relabelable registers by the descending order of their occurrence frequencies

Relabel registers by the sorted order Figure 3-6:Flowchart of our modified regsiter relabeling algorithm

(45)

Table 3-5:An example of register relabeling method 1 Register before relabeling Occurrence frequency r8 12 r6 8 r1 6 r4 6 r3 3 r12 2 ‧ ‧ ‧ ‧ ‧ ‧ Register after relabeling r1 r2 r4 r6 r8 r16 ‧ ‧ ‧

3.4.2

Register Relabeling Method 2

Register relabeling method 1 is program-scoped relabeling, i.e., each relabelabel register

is relabeled to a specific registers consistently for the whole program. However, the relabeling

scope of some registers may be relaxed to be within a procedure, i.e., some registers in

different procedures may be relabeled into a same register to raise the occurrence frequencies

of registers which have less 1s after 4-to-6 SS encoding. Figure 3-7 gives examples to show

the different between program-scope relabeling and procedure-scope relabeling. Figure 3-7 (a)

shows the rank order of each register occurrence frequency. In Figure 3-7 (b), program-scoped

relabeling, after the most occurred registers $8 is relabeled into register $1, register $7 only

can be relabeled into another register $2. While, In Figure 3-7 (c), procedure-scoped

relabeling, if the registers of both two procedures are used independently, registers $8 and

(46)

$1 to gain less 1s in code-words after 4-to-6 SS encoding for more power reduction. r8 r4 Proc A r7 r9 Proc B r8 → r1 r4 → r6 r7 → r2 r9 → r4 Program r8 r7 r9 r4 . . . Rank order r8 r4 r7 r9 r8 → r1 r4 → r2 r7 → r1 r9 → r2 Program Proc A Proc B (a)

Rank order of register occurrence frequencies (b) Program-scoped relabeling (c) Procedure-scoped relabeling 1 2 3 4 . . .

Figure 3-7:Program-scoped relabeling V.S. Procedure-scoped relabeling

Considering register usage convention and no performance degradation, some

relabelabel regsiter must be relabeled consistently for the whole program to keep the

correctness of program execution, while some other relabelable regsiters may be relabeled

independently for each procedure. Therefore, the relabeling scopes of relabelable registers of

register relabeling method 2 are classified into program-scoped and procedure-scoped.

For example, in Table 3-6, MIPS registers, temporary registers of caller-saved registers

and saved registers of callee-saved registers may be further classified as procedure-scoped for

register relabeling. Temporary registers are caller-saved registers in MIPS calling convention.

That is, once a procedure needs to use a temporary register after procedure-call, the procedure

will save the value of the temporary register before procedure-call and restore the value after

(47)

are callee-saved registers in MIPS calling convention. The value of saved registers must be

preserved across procedures. The callee will save the values of saved registers at the

procedure entry and restore at the procedure exit if it needs to use the saved registers.

Therefore, each procedure can use saved registers at will after they are saved at procedure

entry.

The relabeling scope of the remaining relabelable registers of MIPS is program-scope.

Register $1 is reserved for assembler and thus should be relabeled for the whole program.

Registers $26, $27 are reserved for kernel while they may be relabeled in application

stand-alone system. The value of pointer registers, registers $28, $29 can be relabeled for the

whole program since procedures recognize the same register names of pointer registers. The

argument registers, register $4 - $7, which are used for arguments passing for procedures, and

the return value registers, register $2 - $3 , which are used for return value from procedures,

must be keep consistently across procedures; thus, they can be relabeled for the whole

(48)

Table 3-6:MIPS relabelabel registers with different relabeling scope

Category

Name

Number

Use

Relabelable

$at

$1

Assembler temporary

$k0 - $k1

$26 - $27

Kernel registers

$gp

$28

Global pointer

$sp

$29

Stack pointer

$v0 - $v1

$2 - $3

Return value

$a0 - $a3

$4 - $7

Argument registers

$t0 - $t9

$8 - $15,

$24 - $25

Temporary registers

$s0 - $s8

$16 - $23,

$30

Saved registers

Program-scope

Procedure-scope

In register relabeling method 2, we consider the relabeling of procedure-scoped registers

and program-scoped registers together for power reduction. For program-scoped registers, the

occurrence frequency of each register is collected by the same way as that mentioned in

register relabeling method 1. As for the procedure-scoped registers, their occurrence

frequencies have to be totaled from all procedures. Instead of summing up the frequencies of

the same register numbers in different procedures, sum up the frequencies of the same rank

order of occurrence frequency of register. Consequently, the relabeling of procedure-scoped

registers and program-scoped registers can be together, and there are more procedure-scoped

registers which may be relabeled into registers with less 1s after 4-to-6 SS encoding for more

power reduction.

Therefore, the frequencies of procedure-scoped registers are gathered in each procedure

(49)

frequencies of the registers with same rank in different procedures. However, in MIPS

registers, saved registers should not mix with temporary registers to avoid other procedures to

use the values of saved registers without saving at procedure entry and restoring at procedure

exit. Hence, saved registers and temporary registers should be relabeled separately.

The steps of register relabeling method 2 are described as follows:

 Gather the occurrence frequency of each register

For procedure-scoped registers, gather the frequencies in each procedure separately. For

program-scoped registers, gather the frequencies from the whole program.

 Sorting and intermediate relabel within a procedure

For temporary registers, from high occurrence frequency to low occurrence frequency,

relabel them to TRn, where n is the rank order according to its frequency. For saved

registers, from high occurrence frequency to low occurrence frequency, relabel them to

SRn, where n is the rank order according to its frequency.

 Sum up the frequencies of the same TRn/SRn of all procedures.

 Sort TR/SR and program-scoped registers by their occurrence frequencies, and relabel

them by the sorted order according to the relabeling register selection sequence.

Figure 3-8 is an example for the relabeling steps of register relabeling method 2.

Suppose that the instruction partition and the corresponding relabeling selection sequence are

數據

Figure 1-1:Self and coupling-capacitance for buses
Figure 1-2:Examples of a crosstalk 1-bit transition and a crosstalk-toggling transition
Table 2-1:Fibonacci encoding algorithm
Table  2-2  shows  the  comparison  of  these  above  approaches.  Considering  the  required
+7

參考文獻

相關文件

(甲)由於臺灣出現變種新冠肺炎病毒病例,中央流行疫情指揮中心宣布,暫緩無

In this chapter, we have presented two task rescheduling techniques, which are based on QoS guided Min-Min algorithm, aim to reduce the makespan of grid applications in batch

To reduce the leakage current related higher power consumption in highly integrated circuit and overcome the physical thickness limitation of silicon dioxide, the conventional SiO 2

SERVQUAL Scale and relevant scales to bus service quality, and based on service content and customer service related secondary data of H highway bus service company, to design the

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

Shift +a 新增方塊物件→使用 Scale 來調整物 件的大小→Translate 來調整方塊的位置→排 列成樓梯的形狀.. 使用 import 匯入躺椅的

輸入內容【充氣堤壩:在元朗排水繞道和錦田河新河道交匯處前約一百八十

下列關於 CPU 的敘述,何者正確?(A)暫存器是 CPU 內部的記憶體(B)CPU 內部快取記憶體使 用 Flash Memory(C)具有 32 條控制匯流排排線的 CPU,最大定址空間為