Simplifying the Derivation in PACE by Lagrange Multiplier Procedure

3. Optimal Speed Schedule for Ideal CPUs

3.2 Simplifying the Derivation in PACE by Lagrange Multiplier Procedure

Because it is infeasible to change CPU frequency/voltage continuously, we

assign a set of n possible execution cycles sorted from minimum to maximum, {C1, C2, …,Cn}, and only change CPU frequency/voltage after a task executes Ci cycles, wherei=0,...,n−1 andC₀ =0. In other words, the CPU frequency between Ci-1 and C_i is constant. We denote the constant frequency assigned in partition [C_i₋₁,C_i) asf_i and rewrite equation (4) in discrete form:

) Our goal is to find an optimal speed schedule with minimum energy consumption of objective function (5) under the time constraint

f d

It is a constrained nonlinear programming problem to find the minimum value of the objective function (5) with constrain (6). We use the Lagrange multiplier procedure to solve it [24]. First we relax the constrained model into an unconstrained form by weighting constraints in the objective function with Lagrange multiplier v.

The result is a Lagrangian function

Solving the stationary point of the Lagrangian function

We obtain the optimal solution f_i*

10 same as that in [2]. Now we try to solve the stationary point in the Lagrangian

function (7). First, the derivative of the Lagrangian function (7) with respect to v is set to 0:

To verify the computed stationary point is indeed a global minimum, the Hessian matrix of the Lagrangian function L is defined as

⎟⎟ function L is an unconstrained local minimum and is optimal for the objective

function (5) [24].

Chapter 4 Proposed Optimal Schedule in Realistic CPUs (OSRC)

4.1 The Problem of the PACE in Realistic CPUs

The main problem of the optimal speed schedule from PACE [2] is that for realistic CPUs, like Intel XScale [15][16], AMD Duron [17] and Transmeta Cursoe [18], only a limited set of voltage levels for corresponding frequency ranges is supported. This means the optimal speed schedule from PACE is not applicable to these realistic CPUs. Another problem is as follows. When calculating the optimal speed schedule, if we only consider the dynamic power based on equation (1) and neglect the static and leakage powers, it may actually result in less energy saving for the optimal scheme. To obtain the optimal speed schedule, it is more reasonable to use the CPU power consumption data from measurements.

Table 1 and Table 2 show the valid CPU frequency, corresponding voltage and power consumption in Intel PXA255 and PXA270 CPUs, respectively. The power consumption values are different from the theoretical values calculated from equation (1), because most of DVS researches did not consider the short circuit and leakage power consumptions [1]. So it is more reasonable to adopt realistic power consumption values instead of those from equation (1) in DVS algorithms. In this

thesis, we used the power consumption values directly from Table 1 and Table 2 for evaluation.

Table 1：Power consumption specification for Intel PXA255.

Frequency (MHz) Voltage (V) Power (mW)

33 (idle) 1.0 45

200 1.0 178

300 1.1 283

400 1.3 411

Table 2：Power consumption specification for Intel PXA270.

Frequency (MHz) Voltage (V) Power (mW)

13 (idle) 0.85 44.2

104 0.9 115

208 1.15 279

312 1.25 390

416 1.35 570

520 1.45 747

624 1.55 925

For illustration, we use PACE and our proposed OSRC, respectively, to find optimal schedules in realistic CPUs for two example tasks. Table 3 gives the specifications of these two tasks.

Table 3：Specifications of two example tasks

Task C_i (Mc) F^c(C_i) deadline (ms)

Task 1 {5, 15} {1, 0.2} 50

Task 2 {5, 10, 15} {1, 0.3, 0.1} 50

4.2 PACE Approach: Rounding the Frequencies Obtain from PACE to the Nearest Available Frequencies

First, we take task 1 running at Intel PAX255 as an example. From equation (8), if the start-up speed of task 1 is c and task 1 takes 5 million cycles (Mc) and keeps running, the CPU speed should be raised to 1.71c. By equation (8), we found the start-up speed c is 217 MHz, and the speed after 10 Mc is 370 MHz. Because Intel PXA255 does not support these speed settings, it is reasonable to choose the upper nearest available frequencies to avoid deadline missing. After rounding the ideal frequencies, the speed schedule is: 300 MHz at start-up and 400 MHz after 5 Mc executed. We use function p(f) which is C_eff ⋅ f ³to describe the power consumption under speed f and p(f) can be obtained from Table 1. Now we rewrite equation (5) as

∑

= From equation (9), the expected energy consumption is 7.15 mJ if the energy consumption during idea time is included.

Now we take task 2 running at Intel PXA255 as another example. By a similar way, based on PACE, we obtained an optimal schedule of {213 MHz, 319 MHz, 460 MHz}. Because the maximum supported frequency in Intel PXA255 is 400 MHz, we

could not find a rounding up frequency for 460 MHz.

4.3 OSRC Approach

Denoting the available frequencies by a linear combination was often used [22][25][26][27][28]. By rewriting the objective function (5), the stochastic DVS model is formulated to MKP. If a CPU has a limited set of m speeds, {s1, s2,…,sm}, it is better to formulate the original constrained nonlinear programming problem (equations (9) and (6)) as MKP. We denote f_i, the CPU frequency after a task executes Ci-1 cycles and is static until the task executes Ci cycles, as a linear combination ofs_i

Using the same concept of the linear combination of frequencies, the expected energy consumption (E(fi)) by executing (C_i−C_i₋₁) cycles under static CPU frequency

From equation (5), the expected energy consumption based on the intra task DVS stochastic model is the sum of the expected energy consumption in partition[C_i₋₁,C_i). Therefore, the expected energy consumption under a limited set of frequencies combinations is

16 of frequencies combinations is given by

d (“≦”) relation. This is because under a limited set of frequencies, the sum of

j i j

i s x

t ( )⋅ _, is hard to fit the deadline exactly. Now the problem is MKP. x_i_,_jare the only variables that we should solve

Minimizing ⁿ _i_j

In an intra-task DVS schedule, the maximum number of frequency changes during a task running is the number of CPU frequency levels minus one. Because the frequency levels of realistic CPUs [15][16][17][18] are only a few, we use the Dynamic Programming method to solve MKP [31]. From section 3.2, a task consists of n partition, {p₁, p₂, ..., p_n}={C₁,C₂−C₁,...,C_n−C_n₋₁}, and the best feasible solution of r partitions {pi … pn}, wherer=n−i+1, is denoted as Sr, which is a set of {xi,j … xn,j}, wherej∈{1,... ,m}. Using the recursion to solve Sr, we have

⎪⎪

Because the deadline of Sr-1 varies with a selected speed sj in partition pn-r+1, the energy consumption of Sr-1 is given by

∑

⁻⁺

In Sr , under the following two conditions, deadline miss will occur:

1) Sr-1 is null

Procedure Discrete-Optimal-Speed(Sr) if (r == 1) then /*S1means the last partition.*/ temp_S[j] = Discrete-Optimal-Speed(S_r-1);

end for

for (each temp_S[j]) do

Find j where Ej(Sr-1) + F^c(Cn-r)*e(sj) is minimum;

best_j = j;

end for

if (found best_j) then

return {xn-r+1, best_j}∪temp_S[best_j];

elsereturn null;

end if end if

Fig. 1. OSRC procedure.

The proposed OSRC procedure is shown in Fig. 1. Although a recursive procedure is used in the OSRC procedure, the computation will not take too much time due to a limited set of available CPU frequencies. Because voltage scaling is computationally expensive and hampers the possible energy saving [32], the size of n (number of a task’s possible execution cycles) should be small. In addition, the main idea of stochastic DVS is if a periodic task’s actual execution cycles (AEC) follows a distribution, the optimal speed schedule can save the most energy. Since the distribution can be obtained offline or online, the optimal speed schedule needs to be calculated only once and will be used for a long time. Based on the above, the computation time of stochastic intra-task DVS will not be an issue.

To demonstrate the merit of our OSRC over PACE, let’s return to example task 1, as shown in Table 3, which consists of two partitions{p₁, p₂}={C₁,C₂ −C₁}=

Mc}

10 Mc, 5

{ , and the corresponding tail distribution functions are 1 and 0.2. Fig. 2 shows a recursion graph for task 1. Note that each line is labeled with a selected frequency, the dotted lines present possible paths and the solid lines present the optimal paths. Each node is labeled with two values: the upper value is energy consumption and the lower value is time consumed in each partition. Our goal is to find a path which has a minimum sum of energy consumption, and the sum of time consumed must be less or equal to the deadline. Therefore, by using OSRC, the optimal speed schedule is {200 MHz, 400 MHz}, and the energy consumption is 6.51 mJ, which reduce 18% more power than that by using PACE (7.15 mJ from section 4.2). Similarly, by using OSRC, the optimal speed schedule of example task 2 is {200 MHz, 400 MHz, 400 MHz}, which couldn’t be solved by PACE, as shown in section 4.2.

Fig. 2. The recursion graph for Task 1.

Chapter 5 Evaluation and Discussion

5.1 Simulation Model

First, we examined our optimal speed procedure in CPUs with different frequency/voltage levels: Intel PXA255 and PXA270. The energy consumption with respect to each frequency/voltage was according to Table 1 and Table 2, and the energy consumption in idle state was also considered. A single task’s WCEC was set to the worst case execution time(WCET)×f_max; the WCET equals to 50 ms and

fmaxmeans the maximum CPU frequency. α∈{0.2,0.5,0.8}, which is the ratio of best case execution cycles (BCEC)/WCEC, means the variation of a task’s AEC which is between BCEC and WCEC according to normal distribution [5][32]. The mean and standard deviation were set to (WCEC+BCEC) 2 and (WCEC−BCEC) 6, meaning that 99.7 percent of the execution cycles falls in the interval [BCEC, WCEC]

[13]. Because the speed schedule varies with the CPU utilization, we simulated the task’s allowed execution time (AET) between[WCEC f_max,WCEC f_min]. All simulation parameters are list in Table 4.

Table 4：Simulation parameters

Figure CPU WCEC(Mc) BCEC(Mc) AET(ms) N(µ,σ²)

Fig. 3 PXA255 20 4 50~100 (12, 2.7)

Fig. 4 PXA270 31.2 6.24 50~300 (18.7, 4.2)

Fig. 5 PXA255 20 10 50~100 (15,1.7)

Fig. 6 PXA255 20 16 50~100 (18, 1.3)

Fig. 7 PXA270 31.2 15.6 50~300 (23.4, 2.6) Fig. 8 PXA270 31.2 24.96 50~300 (28.1, 1.0)

We have implemented four schemes, including OSRC for performance evaluation:

․ WCE-stretch [5]: The speed schedule assumes that the task will exhibit its worst case behavior, and choose the minimum static frequency.

․ PACE [2] : The optimal speed schedule is calculated by the theoretical value in ideal CPU as described in section 4, and the unavailable frequencies are rounding up to the nearest available ones.

․ OSRC : The speed schedule is calculated by the proposed OSRC schedule in Fig. 1.

․ LB (low bound) : An oracle algorithm knows the AEC in advance.

Because of a limited set of CPU frequency/voltage levels, the unavailable frequencies were replaced by linear combinations of their two immediate frequencies in low bound schemes for maximum energy saving. Unlike other stochastic-related papers [2][5], the performance comparison is based on the expected energy consumption.

And all schemes are normalized with respect to the WCE-stretch.

5.2 Impact of CPU Levels

Fig. 3 and Fig. 4 show the expected energy consumption comparison for all

schemes in CPUs with different voltage/frequency levels: Intel PXA255 and PXA270.

α is set to 0.2. In Fig. 3, the sudden transition between AET/WCET 1.2 and 1.4 in the low bound curve results in the WCE-stretch speed schedule drops the speed from 400 MHz to 300 MHz. When AET/WCET ≧ 2, the curve will up to 1 by the same reason.

In 3 levels CPU, Intel PXA255, OSRC reduces CPU energy consumption between 0%

and 10.2% with an average of 6.5%; PACE reduces CPU energy consumption between -1.2% and 9.9% with an average of 2.0%; the low bound scheme reduces CPU energy consumption between 0% and 18.5% with an average of 10.8%. In 6 levels CPU, Intel PXA270, OSRC reduces CPU energy consumption between 0% and 24.8% with an average of 15.9%; PACE reduces CPU energy consumption between -1.6% and 22.9% with an average of 5.6%; the low bound scheme reduces CPU energy consumption between 0% and 26% with an average of 19.2%. The results show that the more CPU levels, the more energy saving.

0.7 0.8 0.9 1 1.1

1 1.2 1.4 1.6 1.8 2

AET/WCET

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 3. The impact of CPU levels on expected energy consumption in Intel PXA255.

0.7 0.8 0.9 1 1.1

1 2 3 4 5 6

AET/WCET

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 4. The impact of CPU levels on expected energy consumption in Intel PXA270.

5.3 Impact of BCEC/WCEC(α) Ratio

We set α to 0.5 and 0.8, and repeated simulations for these two types of CPUs. In Intel PXA255, as show in Fig. 5 and Fig. 6, OSRC can reduce 5.7% and 2.9% energy consumption in average (upper bound: 11.5% and 8.9%), respectively; the values in PACE are 2.1% and 1.0%. In Intel PXA270, as shown in Fig. 7 and Fig. 8, OSRC can reduce 13.4% and 6.7% energy consumption in average (upper bound: 17.3% and 13.3%), respectively; the values in PACE are 4.4% and 2.0%. These results show that when we set α to 0.5, the impacts of energy reduction are small for both types of CPU.

But when we raised α to 0.8, the optimal schedules are close to the WCE-stretch scheme, especially in 3 levels CPU. Because low slack time limits the aggressive frequency/voltage reduction in the optimal schedule, it happens in all offline DVS schedules.

24 0.7

0.8 0.9 1 1.1

1 1.2 1.4 1.6 1.8 2

AET/WCE

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 5. The impact of

α

on expected energy consumption in PXA255 (

α

=0.5).

0.7 0.8 0.9 1 1.1

1 1.2 1.4 1.6 1.8 2

AET/WCET

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 6. The impact of

α

on expected energy consumption in PXA255 (

α

=0.8).

0.7 0.8 0.9 1 1.1

1 2 3 4 5 6

AET/WCE

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 7. The impact of

α

on expected energy consumption in PXA270 (

α

=0.5).

0.7 0.8 0.9 1 1.1

1 2 3 4 5 6

AET/WCE

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 8. The impact of

α

on expected energy consumption in PXA270 (

α

=0.8).

The average energy saving percentage with respect to WCE-stretch for each scheme in Fig. 3 through Fig. 8 is denoted as (1－ average of expected energy with respect to WCE-stretch). The results are summarized in Table 5, and the proposed

OSRC is three times in average better than that of PACE for realistic CPUs.

Table 5：Average energy saving percentage with respect to WCE-stretch Figure OSRC PACE LB

Fig. 3 6.5% 2.0% 10.8%

Fig. 4 15.9% 5.6% 19.2%

Fig. 5 5.7% 2.1% 11.5%

Fig. 6 2.9% 1.0% 8.9%

Fig. 7 13.4% 4.4% 17.3%

Fig. 8 6.7% 2.0% 13.3%

Chapter 6 Conclusions and Future Work

6.1 Concluding Remarks

In this thesis, we have derived an optimal speed schedule for ideal CPUs for hard real-time systems by the Lagrange multiplier procedure, in a simple and elegant way, compared to PACE [2]. Because of limited available frequency/voltage levels in realistic CPUs, the optimal speed schedule for ideal CPUs can not be applied to realistic CPUs directly. To find an optimal speed schedule for realistic CPUs, we transform the original nonlinear programming problem into MKP based on the frequency/voltage levels and power consumption of a realistic CPU. With limited CPU frequency/voltage levels, the problem can be solved by the OSRC procedure feasibly. To evaluate the merits of the proposed OSRC, the actual data of Intel PXA255 and PXA270 CPU were used in the analysis. We have the following remarks.

First, the analysis results have shown that the poor energy saving by using PACE in realistic CPUs, which is almost the same as that in WCE-stretch. By using the OSRC for realistic CPUs, the results are very close to the low bound derived from an oracle algorithm. Secondly, we observed that the CPU frequency/voltage levels affect the energy efficiency of the optimal speed schedule in the stochastic DVS model: the more the levels, the more the energy saving. Thirdly, we found that compiler-assisted intra-task DVS algorithms are hard to collaborate with inter-task DVS algorithms if the frequency/voltage scaling code is inserted in the source code. Lastly, under the

stochastic DVS model, our scheme can provide the best solution for realistic CPUs using dynamic programming. Evaluation have shown that the energy saving of OSRC is three times in average better than that of PACE in Realistic CPUs.

6.2 Future Work

In stochastic DVS intra-task DVS algorithms, the speed schedule is only calculated once for different AET so that it is easy to work with most of inter-task DVS algorithms. But there still exists some unresolved issues in OSRC. First, all of the approaches addressed in this thesis neglect the time and energy consumption owing to frequency/voltage transitions. If the time of transitions takes too long, a task may miss the deadline. If the energy consumption due to transitions is too much, it reduces the energy saving efficiency. It may even wastes more energy than a non-DVS speed schedule in the worst case. Secondly, if a task has been preempted and then rescheduled, the pre-calculated optimal speed schedule may fail because the AET of the task may become different. These problems deserve for further investigation.

Bibliography

[1] B. Moyer, “Low-Power Design for Embedded Processors,” in Proc. of the IEEE, vol. 89, no. 11, November 2001, pp. 1576-1587.

[2] J.R. Lorch and A.J. Smith, ”PACE: A New Approach to Dynamic Voltage Scaling,” IEEE Trans. on Computers, vol. 53, no. 7, pp. 856-869, July 2004.

[3] C. M. Krishna and Y. H. Lee, “Voltage-Clock-Scaling Adaptive Scheduling Techniques for Low Power in Hard Real-Time Systems,” IEEE Trans. on Computers, vol. 52, no. 12, pp. 1586-1593, December 2003.

[4] D. Zhu, D. Mosse, and R. Melhem, “Power-Aware Scheduling for AND/OR Graph in Real-Time Systems,” IEEE Trans. on Parallel and Distributed Systems, vol. 15, no. 9, pp. 849-864, September 2004.

[5] F. Gruian, “Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors,” in Proc. of ISLPED, 2001, pp. 46-51.

[6] N. AbouGhazaleh, D. Mosse, B. Childers, R. Melhem, and M.

Craven, ”Collaborative operating system and compiler power management for real-time applications,” in Proc. of IEEE RTAS, 2003, pp. 133-141.

[7] Y. Zhu and F. Mueller, “Feedback EDF Scheduling Exploiting Dynamic Voltage Scaling,” in Proc. of IEEE RTAS, 2004, pp. 84-93.

[8] P. Pillai and K. G. Shin, “Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems,” in Proc. of 18th ACM Symposium on Operating Systems Principles, 2001, pp. 89-102.

[9] E. Chan, K. Govil, and H. Wasserman, “Comparing Algorithms for Dynamic Speed-Setting of a Low-Power CPU,” in Proc. of First ACM International Conference on Mobile Computing and Networking, pp. 13-25, Nov. 1995.

[10] D. Shin, S. Lee, and J. Kim, “Intra-Task Voltage Scheduling for Low-Energy Hard Real-Time Applications,” IEEE Design and Test of Computers, Mar. 2001, pp. 20-30.

[11] W. Kim, D. Shin, H. Yun, J. Kim, and S. Min, “Performance comparison of dynamic voltage scaling algorithms for hard real-time systems”, in Proc. of IEEE RTAS, 2002, pp. 219-228.

[12] Laptop and Notebook Computers, Toshiba, http://www.toshibadirect.com/td/b2c /toshibanotebook.to

[13] H. Aydin, R. Melhem, D. Mosse and P.M. Alvarez, “Power-Aware Scheduling for Periodic Real-Time Tasks,” IEEE Trans. Computers, vol.53, no. 5, pp. 584-600, May 2004.

[14] J. Seo, T. Kim, and C. Chung, "Profile-based Optimal Intra-task Voltage Scheduling for Hard Real-Time Applications," in Proc. of the 41st annual Conference on Design Automation, pp. 87-92, June 2004.

[15] Intel. PXA255 Processor Electrical, Mechanical, and Thermal Specification, 2004.

[16] Intel. PXA270 Processor Electrical, Mechanical, and Thermal Specification, 2004.

[17] Product Information, http://www.amd.com/us-en/Processors/ProductInformation /0,,30_118,00.html

[18] LongRun2 Technology, http://www.transmeta.com/longrun2/index.html

[19] D. Shin and J. Kim, “A Profile-Based Energy-Efficient Intra-Task Voltage Scheduling Algorithm for Hard Real-Time Applications,” in Proc. of ISLPED, 2001, pp. 271-274.

[20] M. Weiser, B. Welch, A.Demers, and S. Shenker, “Scheduling for Reduced CPU Energy,” in Proc. of First Symposium on Operating Systems Design and

Implementation, pp. 13-23, Nov. 1994.

[21] T. Ishihara and H. Yasuura, “Voltage scheduling problem for dynamically variable voltage processor,” in Proc. of International Symposium on Low Power Electronic and Design, 1998, pp. 197-202.

[22] Y. Yu and V. K. Prasanna, “Resource Allocation for Independent Real-Time Tasks in Heterogeneous Systems for Energy Minimization,” Journal of

Information Science and Engineering,” vol. 19, no. 3, May 2003, pp. 433-449.

[23] T.A. Feo and M.G.C. Resende, “Greedy Randomized Adaptive Search Procedures,” Journal of Global Optimization, 6, pp. 109-133, 1995.

[24] R. L. Rardin, “Optimization in Operations Research,” Prentice-Hall, 1998.

[25] B. Mochocki, X. Hu, and G. Quan, “A Realistic Variable Voltage Scheduling Model for Real-Time Applications,” in Proc. of the IEEE/ACM International Conference on Computer-aided design, 2002, pp. 726-731.

[26] Y. Zhang, X Hu, and D. Chen, “Task Scheduling and Voltage Selection for Energy Minimization,” in Proc. of the 39^th Conference on Design automation, 2002, pp. 183-188.

[27] A. Andrei, M. Schmitz, P. Eles, Z. Peng, and B. Al-Hashimi,

“Overhead-Conscious Voltage Selection for Dynamic and Leakage Energy Reduction of Time-Constrained Systems,” in Proc. of the Conference on Design, Automation and Test in Europe, 2004, pp. 518-523.

[28] W. Kwon and T. Kim, “Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors,” in Proc. of IEEE DAC’03, 2003, pp. 125-130.

[29] T.L. Matrin and D.P. Siewiorek,” The Impact of Battery Capacity and Memory Bandwidth on CPU Speed-Setting: A Case Study,” in Proc. of ISLPED, Aug, 1999, pp. 200-205.

[30] F. Yao, A. Demers, and S. Shenker, “A Scheduling Model for Reduced CPU Energy,” in Proc. of IEEE FOCS, 1995, pp 374.

[31] R. C. T. Lee, R. C. Chang, S. S. Tseng, and Y. T. Tsai, “Introduction to the Design and Analysis of Algorithms," Unalis corporation, 1999.

[32] Y. Shin and K. Choi, “Power Conscious Fixed Priority Scheduling for Hard

在文檔中真實處理器之高能源效率任務內動態電壓調整策略 (頁 21-0)