Performance Evaluation - Energy Efficient Scheduling for Multi-Core System

6 Energy Efficient Scheduling for Multi-Core System

6.5 Performance Evaluation

instead of 8* V*I*T1. The energy consumption, ES1, is 25% improved after applying proposed method. R_ms of messages in each step are given in Figure 6.5. With the same argument, ES2 and ES3 are 2*V*I*T2 and 1.5*V*I*T3, respectively. Therefore, ΣESs = (6*T₁ + 2*T₂ + 1.5*T₃)*V*I instead of (8*T₁ + 5*T₂ + 2*T₃)*V*I.

Value of R_m in each step

Step1 Step 2 Step 3

Message R_m Message R_m Message R_m

m1 0.5 m2 0.25 m6 0.5

m3 1 m4 0.25 m10 1

m5 0.75 m8 0.25

m7 0.75 m12 1

m9 0.75 m14 0.25

m11 1

m13 0.75

m15 0.5

Total 6 Total 2 Total 1.5

Figure 6.5: Values of Rm in each step.

Assume the T₁ , T₂ and T₃ are 1.625s , 45s and 1.75s, then the ES₁, ES₂ and ES₃ become 9.75*V*I, 90*V*I, 2.625*V*I. Instead of consuming 241.5*V*I, DVC provides 57.7% improvement on energy while just consuming 102.375*V*I.

TPDR scheme. For the results of schedules, this chapter compares both methods and show the improvement of DVC on TPDR. One computing node in this simulation environment is assumed employing a core of Quad-Core CPU in one computer. For better power usage, DVC shows the improvement of power consumption with suggested voltage.

To compare DVC and TPDR, 1000 test samples with particular array size N were executed on P nodes with NC, where N = 10,000 integers, P {16, 32, 64, 128} and NC  {8, 16}. In a total, 24,000 test samples were examined in the comparison. To generate GEN_BLOCK data sets for comparisons, size of data blocks were randomly generated according to the lower and upper bounds in Table 6.1. There are three configurations defined in Table 6.1 to represent different irregularity of data blocks. The low irregularity is represented by configuration α. The lower bound and upper bound of configuration α are 1 and 2, where  = N/P. The lower bound and upper bound of medium irregularity are 1 and 4 for configuration β, respectively. The high irregularity is represented by configuration γ with its bounds from 1 to 8.

Table 6.1: Configurations of data sets.

Configuration Lower bound Upper bound Irregularity

α 1 2 Low

β 1 4 Medium

γ 1 8 High

Figure 6.6 shows the results of comparing DVC and TPDR on 4 sets of nodes with NC

= 8. In the comparisons of each figure, there are 1000 cases for each set of nodes. The plots of “DVC” represent the number of cases DVC performs better while the plots of

“TPDR” represent the number of cases TPDR wins. The plots of “Same” represent the results of both methods are the same in terms of cost. In Figure 6.6 (a), DVC wins 736 cases while TPDR wins 239 cases on 16 nodes. With more nodes, DVC wins over 91.4%

cases on 32, 64 and 128 nodes. Due to the NC is 8, messages which are transmitted across clusters only happen between one couple of clusters on 16 nodes and leads to few messages of Δ4. DVC performs better with more nodes because there are more messages of Δ₄. DVC can avoid separating such messages by gathering local messages together in one step to decrease the unexpected influence for scheduling messages of Δ4. With configuration β, the difference of message size becomes larger. DVC wins 770 cases on 16 nodes and has nine tie cases in Figure 6.6 (b) due to messages of Δ4 only happen between a couple of clusters. DVC shows impressive performance, it win over 91.2%

cases on 32, 64 and 128 nodes. Figure 6.6 (c) shows DVC outperforms TPDR in most cases which explain that DVC adapts heterogeneous network environment and different kind of communications.

76 (a)

(b)

(c)

Figure 6.6: Comparisons of DVC and TPDR with NC = 8. (a) Configuration α. (b) Configuration β. (c) Configuration γ.

Figure 6.7 shows the results of comparison while NC is 16. DVC wins 738 and 872 cases on 16 and 32 nodes as shown in Figure 6.7 (a). DVC performs better than TPDR

but the number of cases drops a little due to larger NC accompanies with less number of Δ4

messages. While the number of nodes increasing, DVC outperforms TPDR over 92%

cases on 64 and 128 nodes. Figure 6.7 (b) and (c) show similar results. It is worth mentioning that DVC performs better and wins more cases with configuration β and γ.

This observation explains that DVC adapts and performs well with larger scale of systems and heterogeneous environments.

(a) (b)

(c)

Figure 6.7: Comparisons of DVC and TPDR with NC = 16. (a) Configuration α. (b) Configuration β. (c) Configuration γ.

The second part of DVC provides a set of suggested voltage values for cores to change voltage. The solutions refer to the schedules given by first part. The R_m of each message in the schedules given by DVC is derived to adjust voltages of cores and evaluate how much power consumption of each step can be saved. Then DVC assumes the

communication time of each step is the cost of each step and then derives the improved power consumption of a complete schedule. Figure 6.8 shows the results of improvement that DVC can achieve on power consumption on various nodes. In Figure 6.8 (a), (b), (c) and (d), the plots of “Original” represent the power consumption before following suggested voltage usage while the plots of “Improved” represent the improved power consumption after following suggested voltage usage. Figure 6.8 (a) shows the original power consumption of schedules given by DVC with 1000 cases on 16 nodes with α, β and γ are 27,589,932*VI, 27,929,874*VI and 28,765,748*VI, respectively. The improved power consumption are 10,076,707*VI, 10,412,863*VI and 10,880,162*VI, respectively.

DVC shows over 62% improvement on power consumption with NC = 8 which is impressive. Figure 6.8 (b) shows the power consumption on 32 nodes with the same NC.

DVC can improve over 60% power consumption with various configurations on 32 nodes.

Figure 6.8 (c) and (d) provide over 51.3% improvement on power consumption with 64 and 128 nodes, respectively.

(a) (b)

Figure 6.8: Improvement of DVC on power consumption with NC = 8. (a) 16 nodes. (b) 32 nodes. (c) 64 nodes. (d) 128 nodes.

Figure 6.9 shows the results of improvement on power consumption given by DVC on 16, 32, 64 and 128 nodes while NC is 16. DVC improves over 53%, 64%, 62.8% and 59.4 % power consumption on 16, 32, 64 and 128 nodes in Figure 6.9 (a), (b), (c) and (d), respectively. Similar to Figure 6.8, DVC successfully decreases the usage of power consumption by adjusting voltage of cores. Due to there is no message of Δ₄, the improvement of DVC drops a little bit with configuration α in Figure 6.9 (a). With more nodes, DVC provides more improvement on power consumption because the first level of the first part successfully avoids possible influence for scheduling messages of Δ4.

(a) (b)

Figure 6.9: Improvement of DVC on power consumption with NC = 16. (a) 16 nodes. (b) 32 nodes. (c) 64 nodes. (d) 128 nodes.

DVC outperforms TPDR in most cases on various number of nodes, NC and configurations. The dynamic voltage mechanism of DVC successfully decreases the power consumption for GEN_BLOCK redistribution. From the above performance analyses and simulation results, we have the following remarks:

Remark 1: The DVC avoids separating Δ₄ messages by gathering local messages together in one step to decrease the unexpected influence.

Remark 2: The DVC adapts and performs better with larger scale of systems and heterogeneous environments.

Remark 3: The DVC successfully decreases the power consumption for GEN_BLOCK redistribution.

Remark 4: The DVC provides more improvement on power consumption because the first level of the first part successfully avoids possible influence.

在文檔中中華大學 (頁 84-92)