OSRC Approach - Proposed Optimal Schedule in Realistic CPUs (OSRC)

4. Proposed Optimal Schedule in Realistic CPUs (OSRC)

4.3 OSRC Approach

Denoting the available frequencies by a linear combination was often used [22][25][26][27][28]. By rewriting the objective function (5), the stochastic DVS model is formulated to MKP. If a CPU has a limited set of m speeds, {s1, s2,…,sm}, it is better to formulate the original constrained nonlinear programming problem (equations (9) and (6)) as MKP. We denote f_i, the CPU frequency after a task executes Ci-1 cycles and is static until the task executes Ci cycles, as a linear combination ofs_i

Using the same concept of the linear combination of frequencies, the expected energy consumption (E(fi)) by executing (C_i−C_i₋₁) cycles under static CPU frequency

From equation (5), the expected energy consumption based on the intra task DVS stochastic model is the sum of the expected energy consumption in partition[C_i₋₁,C_i). Therefore, the expected energy consumption under a limited set of frequencies combinations is

16 of frequencies combinations is given by

d (“≦”) relation. This is because under a limited set of frequencies, the sum of

j i j

i s x

t ( )⋅ _, is hard to fit the deadline exactly. Now the problem is MKP. x_i_,_jare the only variables that we should solve

Minimizing ⁿ _i_j

In an intra-task DVS schedule, the maximum number of frequency changes during a task running is the number of CPU frequency levels minus one. Because the frequency levels of realistic CPUs [15][16][17][18] are only a few, we use the Dynamic Programming method to solve MKP [31]. From section 3.2, a task consists of n partition, {p₁, p₂, ..., p_n}={C₁,C₂−C₁,...,C_n−C_n₋₁}, and the best feasible solution of r partitions {pi … pn}, wherer=n−i+1, is denoted as Sr, which is a set of {xi,j … xn,j}, wherej∈{1,... ,m}. Using the recursion to solve Sr, we have

⎪⎪

Because the deadline of Sr-1 varies with a selected speed sj in partition pn-r+1, the energy consumption of Sr-1 is given by

∑

⁻⁺

In Sr , under the following two conditions, deadline miss will occur:

1) Sr-1 is null

Procedure Discrete-Optimal-Speed(Sr) if (r == 1) then /*S1means the last partition.*/ temp_S[j] = Discrete-Optimal-Speed(S_r-1);

end for

for (each temp_S[j]) do

Find j where Ej(Sr-1) + F^c(Cn-r)*e(sj) is minimum;

best_j = j;

end for

if (found best_j) then

return {xn-r+1, best_j}∪temp_S[best_j];

elsereturn null;

end if end if

Fig. 1. OSRC procedure.

The proposed OSRC procedure is shown in Fig. 1. Although a recursive procedure is used in the OSRC procedure, the computation will not take too much time due to a limited set of available CPU frequencies. Because voltage scaling is computationally expensive and hampers the possible energy saving [32], the size of n (number of a task’s possible execution cycles) should be small. In addition, the main idea of stochastic DVS is if a periodic task’s actual execution cycles (AEC) follows a distribution, the optimal speed schedule can save the most energy. Since the distribution can be obtained offline or online, the optimal speed schedule needs to be calculated only once and will be used for a long time. Based on the above, the computation time of stochastic intra-task DVS will not be an issue.

To demonstrate the merit of our OSRC over PACE, let’s return to example task 1, as shown in Table 3, which consists of two partitions{p₁, p₂}={C₁,C₂ −C₁}=

Mc}

10 Mc, 5

{ , and the corresponding tail distribution functions are 1 and 0.2. Fig. 2 shows a recursion graph for task 1. Note that each line is labeled with a selected frequency, the dotted lines present possible paths and the solid lines present the optimal paths. Each node is labeled with two values: the upper value is energy consumption and the lower value is time consumed in each partition. Our goal is to find a path which has a minimum sum of energy consumption, and the sum of time consumed must be less or equal to the deadline. Therefore, by using OSRC, the optimal speed schedule is {200 MHz, 400 MHz}, and the energy consumption is 6.51 mJ, which reduce 18% more power than that by using PACE (7.15 mJ from section 4.2). Similarly, by using OSRC, the optimal speed schedule of example task 2 is {200 MHz, 400 MHz, 400 MHz}, which couldn’t be solved by PACE, as shown in section 4.2.

Fig. 2. The recursion graph for Task 1.

Chapter 5 Evaluation and Discussion

5.1 Simulation Model

First, we examined our optimal speed procedure in CPUs with different frequency/voltage levels: Intel PXA255 and PXA270. The energy consumption with respect to each frequency/voltage was according to Table 1 and Table 2, and the energy consumption in idle state was also considered. A single task’s WCEC was set to the worst case execution time(WCET)×f_max; the WCET equals to 50 ms and

fmaxmeans the maximum CPU frequency. α∈{0.2,0.5,0.8}, which is the ratio of best case execution cycles (BCEC)/WCEC, means the variation of a task’s AEC which is between BCEC and WCEC according to normal distribution [5][32]. The mean and standard deviation were set to (WCEC+BCEC) 2 and (WCEC−BCEC) 6, meaning that 99.7 percent of the execution cycles falls in the interval [BCEC, WCEC]

[13]. Because the speed schedule varies with the CPU utilization, we simulated the task’s allowed execution time (AET) between[WCEC f_max,WCEC f_min]. All simulation parameters are list in Table 4.

Table 4：Simulation parameters

Figure CPU WCEC(Mc) BCEC(Mc) AET(ms) N(µ,σ²)

Fig. 3 PXA255 20 4 50~100 (12, 2.7)

Fig. 4 PXA270 31.2 6.24 50~300 (18.7, 4.2)

Fig. 5 PXA255 20 10 50~100 (15,1.7)

Fig. 6 PXA255 20 16 50~100 (18, 1.3)

Fig. 7 PXA270 31.2 15.6 50~300 (23.4, 2.6) Fig. 8 PXA270 31.2 24.96 50~300 (28.1, 1.0)

We have implemented four schemes, including OSRC for performance evaluation:

․ WCE-stretch [5]: The speed schedule assumes that the task will exhibit its worst case behavior, and choose the minimum static frequency.

․ PACE [2] : The optimal speed schedule is calculated by the theoretical value in ideal CPU as described in section 4, and the unavailable frequencies are rounding up to the nearest available ones.

․ OSRC : The speed schedule is calculated by the proposed OSRC schedule in Fig. 1.

․ LB (low bound) : An oracle algorithm knows the AEC in advance.

Because of a limited set of CPU frequency/voltage levels, the unavailable frequencies were replaced by linear combinations of their two immediate frequencies in low bound schemes for maximum energy saving. Unlike other stochastic-related papers [2][5], the performance comparison is based on the expected energy consumption.

And all schemes are normalized with respect to the WCE-stretch.

5.2 Impact of CPU Levels

Fig. 3 and Fig. 4 show the expected energy consumption comparison for all

schemes in CPUs with different voltage/frequency levels: Intel PXA255 and PXA270.

α is set to 0.2. In Fig. 3, the sudden transition between AET/WCET 1.2 and 1.4 in the low bound curve results in the WCE-stretch speed schedule drops the speed from 400 MHz to 300 MHz. When AET/WCET ≧ 2, the curve will up to 1 by the same reason.

In 3 levels CPU, Intel PXA255, OSRC reduces CPU energy consumption between 0%

and 10.2% with an average of 6.5%; PACE reduces CPU energy consumption between -1.2% and 9.9% with an average of 2.0%; the low bound scheme reduces CPU energy consumption between 0% and 18.5% with an average of 10.8%. In 6 levels CPU, Intel PXA270, OSRC reduces CPU energy consumption between 0% and 24.8% with an average of 15.9%; PACE reduces CPU energy consumption between -1.6% and 22.9% with an average of 5.6%; the low bound scheme reduces CPU energy consumption between 0% and 26% with an average of 19.2%. The results show that the more CPU levels, the more energy saving.

0.7 0.8 0.9 1 1.1

1 1.2 1.4 1.6 1.8 2

AET/WCET

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 3. The impact of CPU levels on expected energy consumption in Intel PXA255.

0.7 0.8 0.9 1 1.1

1 2 3 4 5 6

AET/WCET

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 4. The impact of CPU levels on expected energy consumption in Intel PXA270.

5.3 Impact of BCEC/WCEC(α) Ratio

We set α to 0.5 and 0.8, and repeated simulations for these two types of CPUs. In Intel PXA255, as show in Fig. 5 and Fig. 6, OSRC can reduce 5.7% and 2.9% energy consumption in average (upper bound: 11.5% and 8.9%), respectively; the values in PACE are 2.1% and 1.0%. In Intel PXA270, as shown in Fig. 7 and Fig. 8, OSRC can reduce 13.4% and 6.7% energy consumption in average (upper bound: 17.3% and 13.3%), respectively; the values in PACE are 4.4% and 2.0%. These results show that when we set α to 0.5, the impacts of energy reduction are small for both types of CPU.

But when we raised α to 0.8, the optimal schedules are close to the WCE-stretch scheme, especially in 3 levels CPU. Because low slack time limits the aggressive frequency/voltage reduction in the optimal schedule, it happens in all offline DVS schedules.

24 0.7

0.8 0.9 1 1.1

1 1.2 1.4 1.6 1.8 2

AET/WCE

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 5. The impact of

α

on expected energy consumption in PXA255 (

α

=0.5).

0.7 0.8 0.9 1 1.1

1 1.2 1.4 1.6 1.8 2

AET/WCET

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 6. The impact of

α

on expected energy consumption in PXA255 (

α

=0.8).

0.7 0.8 0.9 1 1.1

1 2 3 4 5 6

AET/WCE

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 7. The impact of

α

on expected energy consumption in PXA270 (

α

=0.5).

0.7 0.8 0.9 1 1.1

1 2 3 4 5 6

AET/WCE

Expected energy consumption w.r.t. WCE-stretch

OSRC PACE LB

Fig. 8. The impact of

α

on expected energy consumption in PXA270 (

α

=0.8).

The average energy saving percentage with respect to WCE-stretch for each scheme in Fig. 3 through Fig. 8 is denoted as (1－ average of expected energy with respect to WCE-stretch). The results are summarized in Table 5, and the proposed

OSRC is three times in average better than that of PACE for realistic CPUs.

Table 5：Average energy saving percentage with respect to WCE-stretch Figure OSRC PACE LB

Fig. 3 6.5% 2.0% 10.8%

Fig. 4 15.9% 5.6% 19.2%

Fig. 5 5.7% 2.1% 11.5%

Fig. 6 2.9% 1.0% 8.9%

Fig. 7 13.4% 4.4% 17.3%

Fig. 8 6.7% 2.0% 13.3%

Chapter 6 Conclusions and Future Work

6.1 Concluding Remarks

In this thesis, we have derived an optimal speed schedule for ideal CPUs for hard real-time systems by the Lagrange multiplier procedure, in a simple and elegant way, compared to PACE [2]. Because of limited available frequency/voltage levels in realistic CPUs, the optimal speed schedule for ideal CPUs can not be applied to realistic CPUs directly. To find an optimal speed schedule for realistic CPUs, we transform the original nonlinear programming problem into MKP based on the frequency/voltage levels and power consumption of a realistic CPU. With limited CPU frequency/voltage levels, the problem can be solved by the OSRC procedure feasibly. To evaluate the merits of the proposed OSRC, the actual data of Intel PXA255 and PXA270 CPU were used in the analysis. We have the following remarks.

First, the analysis results have shown that the poor energy saving by using PACE in realistic CPUs, which is almost the same as that in WCE-stretch. By using the OSRC for realistic CPUs, the results are very close to the low bound derived from an oracle algorithm. Secondly, we observed that the CPU frequency/voltage levels affect the energy efficiency of the optimal speed schedule in the stochastic DVS model: the more the levels, the more the energy saving. Thirdly, we found that compiler-assisted intra-task DVS algorithms are hard to collaborate with inter-task DVS algorithms if the frequency/voltage scaling code is inserted in the source code. Lastly, under the

stochastic DVS model, our scheme can provide the best solution for realistic CPUs using dynamic programming. Evaluation have shown that the energy saving of OSRC is three times in average better than that of PACE in Realistic CPUs.

6.2 Future Work

In stochastic DVS intra-task DVS algorithms, the speed schedule is only calculated once for different AET so that it is easy to work with most of inter-task DVS algorithms. But there still exists some unresolved issues in OSRC. First, all of the approaches addressed in this thesis neglect the time and energy consumption owing to frequency/voltage transitions. If the time of transitions takes too long, a task may miss the deadline. If the energy consumption due to transitions is too much, it reduces the energy saving efficiency. It may even wastes more energy than a non-DVS speed schedule in the worst case. Secondly, if a task has been preempted and then rescheduled, the pre-calculated optimal speed schedule may fail because the AET of the task may become different. These problems deserve for further investigation.

Bibliography

[1] B. Moyer, “Low-Power Design for Embedded Processors,” in Proc. of the IEEE, vol. 89, no. 11, November 2001, pp. 1576-1587.

[2] J.R. Lorch and A.J. Smith, ”PACE: A New Approach to Dynamic Voltage Scaling,” IEEE Trans. on Computers, vol. 53, no. 7, pp. 856-869, July 2004.

[3] C. M. Krishna and Y. H. Lee, “Voltage-Clock-Scaling Adaptive Scheduling Techniques for Low Power in Hard Real-Time Systems,” IEEE Trans. on Computers, vol. 52, no. 12, pp. 1586-1593, December 2003.

[4] D. Zhu, D. Mosse, and R. Melhem, “Power-Aware Scheduling for AND/OR Graph in Real-Time Systems,” IEEE Trans. on Parallel and Distributed Systems, vol. 15, no. 9, pp. 849-864, September 2004.

[5] F. Gruian, “Hard Real-Time Scheduling for Low-Energy Using Stochastic Data and DVS Processors,” in Proc. of ISLPED, 2001, pp. 46-51.

[6] N. AbouGhazaleh, D. Mosse, B. Childers, R. Melhem, and M.

Craven, ”Collaborative operating system and compiler power management for real-time applications,” in Proc. of IEEE RTAS, 2003, pp. 133-141.

[7] Y. Zhu and F. Mueller, “Feedback EDF Scheduling Exploiting Dynamic Voltage Scaling,” in Proc. of IEEE RTAS, 2004, pp. 84-93.

[8] P. Pillai and K. G. Shin, “Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems,” in Proc. of 18th ACM Symposium on Operating Systems Principles, 2001, pp. 89-102.

[9] E. Chan, K. Govil, and H. Wasserman, “Comparing Algorithms for Dynamic Speed-Setting of a Low-Power CPU,” in Proc. of First ACM International Conference on Mobile Computing and Networking, pp. 13-25, Nov. 1995.

[10] D. Shin, S. Lee, and J. Kim, “Intra-Task Voltage Scheduling for Low-Energy Hard Real-Time Applications,” IEEE Design and Test of Computers, Mar. 2001, pp. 20-30.

[11] W. Kim, D. Shin, H. Yun, J. Kim, and S. Min, “Performance comparison of dynamic voltage scaling algorithms for hard real-time systems”, in Proc. of IEEE RTAS, 2002, pp. 219-228.

[12] Laptop and Notebook Computers, Toshiba, http://www.toshibadirect.com/td/b2c /toshibanotebook.to

[13] H. Aydin, R. Melhem, D. Mosse and P.M. Alvarez, “Power-Aware Scheduling for Periodic Real-Time Tasks,” IEEE Trans. Computers, vol.53, no. 5, pp. 584-600, May 2004.

[14] J. Seo, T. Kim, and C. Chung, "Profile-based Optimal Intra-task Voltage Scheduling for Hard Real-Time Applications," in Proc. of the 41st annual Conference on Design Automation, pp. 87-92, June 2004.

[15] Intel. PXA255 Processor Electrical, Mechanical, and Thermal Specification, 2004.

[16] Intel. PXA270 Processor Electrical, Mechanical, and Thermal Specification, 2004.

[17] Product Information, http://www.amd.com/us-en/Processors/ProductInformation /0,,30_118,00.html

[18] LongRun2 Technology, http://www.transmeta.com/longrun2/index.html

[19] D. Shin and J. Kim, “A Profile-Based Energy-Efficient Intra-Task Voltage Scheduling Algorithm for Hard Real-Time Applications,” in Proc. of ISLPED, 2001, pp. 271-274.

[20] M. Weiser, B. Welch, A.Demers, and S. Shenker, “Scheduling for Reduced CPU Energy,” in Proc. of First Symposium on Operating Systems Design and

Implementation, pp. 13-23, Nov. 1994.

[21] T. Ishihara and H. Yasuura, “Voltage scheduling problem for dynamically variable voltage processor,” in Proc. of International Symposium on Low Power Electronic and Design, 1998, pp. 197-202.

[22] Y. Yu and V. K. Prasanna, “Resource Allocation for Independent Real-Time Tasks in Heterogeneous Systems for Energy Minimization,” Journal of

Information Science and Engineering,” vol. 19, no. 3, May 2003, pp. 433-449.

[23] T.A. Feo and M.G.C. Resende, “Greedy Randomized Adaptive Search Procedures,” Journal of Global Optimization, 6, pp. 109-133, 1995.

[24] R. L. Rardin, “Optimization in Operations Research,” Prentice-Hall, 1998.

[25] B. Mochocki, X. Hu, and G. Quan, “A Realistic Variable Voltage Scheduling Model for Real-Time Applications,” in Proc. of the IEEE/ACM International Conference on Computer-aided design, 2002, pp. 726-731.

[26] Y. Zhang, X Hu, and D. Chen, “Task Scheduling and Voltage Selection for Energy Minimization,” in Proc. of the 39^th Conference on Design automation, 2002, pp. 183-188.

[27] A. Andrei, M. Schmitz, P. Eles, Z. Peng, and B. Al-Hashimi,

“Overhead-Conscious Voltage Selection for Dynamic and Leakage Energy Reduction of Time-Constrained Systems,” in Proc. of the Conference on Design, Automation and Test in Europe, 2004, pp. 518-523.

[28] W. Kwon and T. Kim, “Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors,” in Proc. of IEEE DAC’03, 2003, pp. 125-130.

[29] T.L. Matrin and D.P. Siewiorek,” The Impact of Battery Capacity and Memory Bandwidth on CPU Speed-Setting: A Case Study,” in Proc. of ISLPED, Aug, 1999, pp. 200-205.

[30] F. Yao, A. Demers, and S. Shenker, “A Scheduling Model for Reduced CPU Energy,” in Proc. of IEEE FOCS, 1995, pp 374.

[31] R. C. T. Lee, R. C. Chang, S. S. Tseng, and Y. T. Tsai, “Introduction to the Design and Analysis of Algorithms," Unalis corporation, 1999.

[32] Y. Shin and K. Choi, “Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems,” in Proc. of 36^th Design Automation Conf., pp. 134-139, 1999.

[33] A. Andrei, M. T. Schmitz, P. Eles, Zebo Peng and B. M. Al Hashimi,

“Quasi-Static Voltage Scaling for Energy Minimization with Time Constraints,” in Proc. of Design Automation and Test in Europe Conf., pp. 514-519, March, 2005.

[34] G. Sudha Anil Kumar and G. Manimaran, “An Intra-task DVS Algorithm Exploiting Path Probabilities for Real-time Systems,” in SIGBED Review, vol. 2, no. 2, April 2005.

[35] R. Amstrong, D. Kung, P. Sinha and A. Zoltners, “A Computational Study of Multiple Choice Knapsack Algorithm,” ACM Trans. on Mathematical Software, vol. 9, no. 2, pp. 184-198, June 1983.

在文檔中真實處理器之高能源效率任務內動態電壓調整策略 (頁 28-0)