• 沒有找到結果。

Dynamic Voltage Communication Scheduling Technique

在文檔中 中 華 大 學 (頁 79-84)

6 Energy Efficient Scheduling for Multi-Core System

6.4 Dynamic Voltage Communication Scheduling Technique

With the improvement of hardware techniques, multi-core system become popular in parallel systems. However, present scheduling algorithms are doubted for good load balancing and performance. To optimize GEN_BLOCK redistribution on such systems, the dynamic voltage communication scheduling technique (DVC) is proposed. There are two parts of DVC. The first part provides four transmission types and a scheduling policy to improve communication cost. The transmission types are listed as follows:

 Δ1 : The transmission (local message) happens while a core sends data to its self.

 Δ2 : The transmission happens while a core sends data to other cores in the same node.

 Δ3 : The transmission happens while a core sends data to cores in other node in the

69

same cluster.

 Δ4 : The transmission happens while a core sends data to cores in other clusters.

The network bandwidth of the four kinds of transmissions is different from each other, mixing them together while scheduling may increase the synchronization delay in each scheduling step. The scheduling policy deals with them in two levels. The first level schedules all local messages in one communication step based on local message reduction concept. The second level schedules the remaining messages using a degree-reduction scheme and an adjustable coloring technique, as outlined below.

1. Schedule all local messages in the same scheduling step.

2. For Degreemax > 2 bipartite graph G, performing degree reduction scheduling.

3. Degreemax = Degreemax  1. If Degreemax > 2, repeat (2), otherwise resulting bipartite graph G’ with Degreemax = 2.

4. For Degreemax  2 bipartite graph, using coloring theory to schedule the remaining messages in G’ into two steps.

With the introduction of Quad-Core CPU system structures in pervious chapter, multi-core CPUs have the ability to change voltage for cores by “SpeedStep” and

“PowerNow!”. The second part of DVC provides a set of voltage values for each core to change voltage. The solution is refer to the schedule given by first part. In each step of the schedule, there are messages with different data sizes. The communication time of this step is relative to the data size of dominator. It is reasonable that the core which dealing with the dominator uses higher voltage to shorten the time, but others which deals with messages with smaller data size should use lower voltage. However, the lower voltage lengthens the transmitting time, it is the goal of proposed method to recommend the proper voltage levels for cores to reduce energy consumption and extend the time reasonably. Notations and terminology of DVC to derive proper level of voltage and are

70

listed as follows:

Costm: Cost of message m, where m is the index of messages.

 Costs: Cost of step s, where s is the index of steps.

 V: Voltage of each core.

 I: Electric current.

 Rm: The ratio of voltage variation for message m. Rm =

s m

Cost

Cost *f, where m is the

index of messages, f is the adjustable parameter. If Rm ≤ 0.25, f adjusts it to 0.25; if 0.25 < Rm ≤ 0. 5, f adjusts it to 0.5; if 0.5 < Rm ≤ 0.75, f adjusts it to 0.75; otherwise Rm

= 1.

 Ts: Transmission time of step s.

 EMm: Energy of message m. EMm = V*Rm*I*Ts, where m is the index of messages.

 ESs: Energy of step s, where s is the index of steps.

 ESs= ΣEMm, where message m ∈ step s.

 ΣESs: The total power consumption.

Assumed the data size of the dominator in a step is Costs and others are Costm. To save energy by adjusting the voltage of cores, Rm is defined to derive the level of voltages.

It is the recommended parameter that the factor f multiplies the ratio of Costm and Costs. The factor f helps lengthen transmission time of non-dominator messages and not exceed the length of arranged step. Except the core which deals with dominator, voltage of most cores is suggested to be changed according Rm. The energy consumption of cores with lower voltage is then reduced. The energy of transmitting a message (EMm), a step (ESs) and a schedule can be saved with applying DVC.

Following is an example to explain the proposed DVC. Figure 6.3 shows the architecture of multi-core machines in two clusters in gird system. Circles help

71

distinguish the elements in each cluster and in each multi-core PC. There are one Quad-Core PC and one Dual-Core PC in cluster 1 and one Dual-Core PC in cluster 2.

Edges, which are illustrated in the circles and across the circles, are relation between SP and DP, and are generated from the information of source and destination distribution schemes. Both schemes are {5, 24, 23, 7, 31, 22, 18, 4} and {18, 20, 8, 21, 9, 27, 18, 13}, respectively.

SP0

SP3

SP1

SP2

SP4

SP5

DP0

DP3

DP1

DP2

DP4

DP5

5

24

23

7

3 31

18

20

8

21

17 9

m11

m10

m9

m8

m7

m6

m5

m4

m3

m2

m1

(13) (14)

(9) (8) (7) (6)

(8) (9) (11) (13)

(5) Cluster 1

Cluster 2

Quad-Core PC1

Dual-Core PC2

Dual-Core PC3 SP6

SP7

DP6

DP7

3 31

17 9

m15

m14

m12

(4) (9) (9)

m13

(9)

Figure 6.3: The architecture of multi-core machines in two clusters in gird system.

DVC classifies messages according to the definition Δ1, Δ2, Δ3 and Δ4. RLR,

72

assumed 8, is used to modify cost of Δ1 and Δ2. DRR, assumed 5, is used to modify the cost of Δ4. Messages of Δ1 are m1, m3, m5, m7, m9, m11, m13 and m15, communication cost of these messages are 0.625, 1.375, 1, 0.875, 1.125, 1.625, 1.125 and 0.5, respectively.

Messages of Δ2 are m2, m4, m6, m10 and m14, communication cost are 1.625, 1.125, 0.75, 1.75 and 1.125, respectively. Message of Δ3 is m8, and the cost is 8. Finally, message of Δ4 is m12, and the cost is 45. DVC first schedules messages of Δ1 in step 1, the length is 1.625, dominated by m11 in Figure 6.4. This operation helps avoid synchronization delay and release more space for following messages. The messages of Δ2, Δ3 and Δ4 are then scheduled in step 2 and 3. The length of step 2 is 45 which is dominated by m12. The length of last step is 1.75, and is dominated by m10. Note that the two most huge messages, m8 and m12 are scheduled together in step 2. Without the previous operation in step 1, they will be in separated steps and result in synchronization delay.

A schedule of DVC

No. of step No. of message Length of step

Step 1 m1(0.625), m3(1.375), m5(1), m7(0.875), m9(1.125), m11(1.625) , m13(1.125) , m15(0.5)

1.625

Step 2 m2(1.625), m4(1.125), m8(8), m12(45) , m14(1.125) 45

Step 3 m6(0.75), m10(1.75) 1.75

Total cost 48.375

Figure 6.4: A schedule of DVC.

The recommended value of voltage to reduce energy consumption are derived based on the schedule of DVC. In step 1, the Cost1 is 1.625, which is the cost of the dominator m11. For m1, R1 is 0.5 given by the formula of Rm and then the energy of m1 is 0.45*

V*I*T1. After the energy of other messages are evaluated in step 1, the ES1 is 6* V*I*T1

73

instead of 8* V*I*T1. The energy consumption, ES1, is 25% improved after applying proposed method. Rms of messages in each step are given in Figure 6.5. With the same argument, ES2 and ES3 are 2*V*I*T2 and 1.5*V*I*T3, respectively. Therefore, ΣESs = (6*T1 + 2*T2 + 1.5*T3)*V*I instead of (8*T1 + 5*T2 + 2*T3)*V*I.

Value of Rm in each step

Step1 Step 2 Step 3

Message Rm Message Rm Message Rm

m1 0.5 m2 0.25 m6 0.5

m3 1 m4 0.25 m10 1

m5 0.75 m8 0.25

m7 0.75 m12 1

m9 0.75 m14 0.25

m11 1

m13 0.75

m15 0.5

Total 6 Total 2 Total 1.5

Figure 6.5: Values of Rm in each step.

Assume the T1 , T2 and T3 are 1.625s , 45s and 1.75s, then the ES1, ES2 and ES3 become 9.75*V*I, 90*V*I, 2.625*V*I. Instead of consuming 241.5*V*I, DVC provides 57.7% improvement on energy while just consuming 102.375*V*I.

在文檔中 中 華 大 學 (頁 79-84)

相關文件