Dynamic Voltage Communication Scheduling Technique

6 Energy Efficient Scheduling for Multi-Core System

6.4 Dynamic Voltage Communication Scheduling Technique

With the improvement of hardware techniques, multi-core system become popular in parallel systems. However, present scheduling algorithms are doubted for good load balancing and performance. To optimize GEN_BLOCK redistribution on such systems, the dynamic voltage communication scheduling technique (DVC) is proposed. There are two parts of DVC. The first part provides four transmission types and a scheduling policy to improve communication cost. The transmission types are listed as follows:

 Δ1 : The transmission (local message) happens while a core sends data to its self.

 Δ₂ : The transmission happens while a core sends data to other cores in the same node.

 Δ3 : The transmission happens while a core sends data to cores in other node in the

same cluster.

 Δ₄ : The transmission happens while a core sends data to cores in other clusters.

The network bandwidth of the four kinds of transmissions is different from each other, mixing them together while scheduling may increase the synchronization delay in each scheduling step. The scheduling policy deals with them in two levels. The first level schedules all local messages in one communication step based on local message reduction concept. The second level schedules the remaining messages using a degree-reduction scheme and an adjustable coloring technique, as outlined below.

1. Schedule all local messages in the same scheduling step.

2. For Degree_max > 2 bipartite graph G, performing degree reduction scheduling.

3. Degree_max = Degree_max  1. If Degreemax > 2, repeat (2), otherwise resulting bipartite graph G’ with Degree_max = 2.

4. For Degree_max  2 bipartite graph, using coloring theory to schedule the remaining messages in G’ into two steps.

With the introduction of Quad-Core CPU system structures in pervious chapter, multi-core CPUs have the ability to change voltage for cores by “SpeedStep” and

“PowerNow!”. The second part of DVC provides a set of voltage values for each core to change voltage. The solution is refer to the schedule given by first part. In each step of the schedule, there are messages with different data sizes. The communication time of this step is relative to the data size of dominator. It is reasonable that the core which dealing with the dominator uses higher voltage to shorten the time, but others which deals with messages with smaller data size should use lower voltage. However, the lower voltage lengthens the transmitting time, it is the goal of proposed method to recommend the proper voltage levels for cores to reduce energy consumption and extend the time reasonably. Notations and terminology of DVC to derive proper level of voltage and are

listed as follows:



Cost_m: Cost of message m, where m is the index of messages.

 Costs: Cost of step s, where s is the index of steps.

 V: Voltage of each core.

 I: Electric current.

 Rm: The ratio of voltage variation for message m. Rm =

s m

Cost

Cost *f, where m is the

index of messages, f is the adjustable parameter. If R_m ≤ 0.25, f adjusts it to 0.25; if 0.25 < Rm ≤ 0. 5, f adjusts it to 0.5; if 0.5 < Rm ≤ 0.75, f adjusts it to 0.75; otherwise Rm

= 1.

 Ts: Transmission time of step s.

 EM_m: Energy of message m. EM_m = V*R_m*I*T_s, where m is the index of messages.

 ESs: Energy of step s, where s is the index of steps.

 ES_s= ΣEM_m, where message m ∈ step s.

 ΣESs: The total power consumption.

Assumed the data size of the dominator in a step is Costs and others are Costm. To save energy by adjusting the voltage of cores, R_m is defined to derive the level of voltages.

It is the recommended parameter that the factor f multiplies the ratio of Costm and Costs. The factor f helps lengthen transmission time of non-dominator messages and not exceed the length of arranged step. Except the core which deals with dominator, voltage of most cores is suggested to be changed according R_m. The energy consumption of cores with lower voltage is then reduced. The energy of transmitting a message (EMm), a step (ESs) and a schedule can be saved with applying DVC.

Following is an example to explain the proposed DVC. Figure 6.3 shows the architecture of multi-core machines in two clusters in gird system. Circles help

distinguish the elements in each cluster and in each multi-core PC. There are one Quad-Core PC and one Dual-Core PC in cluster 1 and one Dual-Core PC in cluster 2.

Edges, which are illustrated in the circles and across the circles, are relation between SP and DP, and are generated from the information of source and destination distribution schemes. Both schemes are {5, 24, 23, 7, 31, 22, 18, 4} and {18, 20, 8, 21, 9, 27, 18, 13}, respectively.

SP0

SP3

SP1

SP2

SP4

SP5

DP0

DP3

DP1

DP2

DP4

DP5

3 31

17 9

m11

m10

(13) (14)

(9) (8) (7) (6)

(8) (9) (11) (13)

(5) Cluster 1

Cluster 2

Quad-Core PC1

Dual-Core PC2

Dual-Core PC3 SP6

SP7

DP6

DP7

3 31

17 9

m15

m14

m12

(4) (9) (9)

m13

(9)

Figure 6.3: The architecture of multi-core machines in two clusters in gird system.

DVC classifies messages according to the definition Δ1, Δ2, Δ3 and Δ4. RLR,

assumed 8, is used to modify cost of Δ1 and Δ2. DRR, assumed 5, is used to modify the cost of Δ₄. Messages of Δ₁ are m₁, m₃, m₅, m₇, m₉, m₁₁, m₁₃ and m₁₅, communication cost of these messages are 0.625, 1.375, 1, 0.875, 1.125, 1.625, 1.125 and 0.5, respectively.

Messages of Δ₂ are m₂, m₄, m_6, m₁₀and m₁₄, communication cost are 1.625, 1.125, 0.75, 1.75 and 1.125, respectively. Message of Δ3 is m8, and the cost is 8. Finally, message of Δ4 is m₁₂, and the cost is 45. DVC first schedules messages of Δ₁ in step 1, the length is 1.625, dominated by m11 in Figure 6.4. This operation helps avoid synchronization delay and release more space for following messages. The messages of Δ₂, Δ₃ and Δ₄ are then scheduled in step 2 and 3. The length of step 2 is 45 which is dominated by m12. The length of last step is 1.75, and is dominated by m₁₀. Note that the two most huge messages, m8 and m12 are scheduled together in step 2. Without the previous operation in step 1, they will be in separated steps and result in synchronization delay.

A schedule of DVC

No. of step No. of message Length of step

Step 1 m₁(0.625), m₃(1.375), m₅(1), m₇(0.875), m₉(1.125), m₁₁(1.625) , m₁₃(1.125) , m₁₅(0.5)

1.625

Step 2 m₂(1.625), m₄(1.125), m₈(8), m₁₂(45) , m₁₄(1.125) 45

Step 3 m₆(0.75), m₁₀(1.75) 1.75

Total cost 48.375

Figure 6.4: A schedule of DVC.

The recommended value of voltage to reduce energy consumption are derived based on the schedule of DVC. In step 1, the Cost₁ is 1.625, which is the cost of the dominator m11. For m1, R1 is 0.5 given by the formula of Rm and then the energy of m1 is 0.45*

V*I*T₁. After the energy of other messages are evaluated in step 1, the ES₁ is 6* V*I*T₁

instead of 8* V*I*T1. The energy consumption, ES1, is 25% improved after applying proposed method. R_ms of messages in each step are given in Figure 6.5. With the same argument, ES2 and ES3 are 2*V*I*T2 and 1.5*V*I*T3, respectively. Therefore, ΣESs = (6*T₁ + 2*T₂ + 1.5*T₃)*V*I instead of (8*T₁ + 5*T₂ + 2*T₃)*V*I.

Value of R_m in each step

Step1 Step 2 Step 3

Message R_m Message R_m Message R_m

m1 0.5 m2 0.25 m6 0.5

m3 1 m4 0.25 m10 1

m5 0.75 m8 0.25

m7 0.75 m12 1

m9 0.75 m14 0.25

m11 1

m13 0.75

m15 0.5

Total 6 Total 2 Total 1.5

Figure 6.5: Values of Rm in each step.

Assume the T₁ , T₂ and T₃ are 1.625s , 45s and 1.75s, then the ES₁, ES₂ and ES₃ become 9.75*V*I, 90*V*I, 2.625*V*I. Instead of consuming 241.5*V*I, DVC provides 57.7% improvement on energy while just consuming 102.375*V*I.

在文檔中中華大學 (頁 79-84)