An approximation algorithm for energy-efficient scheduling on a chip multiprocessor

(1)

An Approximation Algorithm for Energy-Efficient Scheduling

on A Chip Multiprocessor

∗

Chuan-Yue Yang, Jian-Jia Chen, and Tei-Wei Kuo

Department of Computer Science and Information Engineering,

Graduate Institute of Networking and Multimedia,

National Taiwan University, Taipei, Taiwan, ROC.

Email:

{r92032, r90079, ktw}@csie.ntu.edu.tw

Abstract

In the recent decade, voltage scaling has become an at-tractive feature for many system component designs. In this paper, we consider energy-efficient real-time task scheduling over a chip multiprocessor architecture. The objective is to schedule a set of frame-based tasks with the minimum energy consumption, where all tasks are ready at time 0 and share a common deadline. We show that such a minimization prob-lem is NP-hard and then propose a 2.371-approximation al-gorithm. The strength of the proposed algorithm was demon-strated by a series of simulations, for which near optimal re-sults were obtained.

1 Introduction

With the increasing popularity and prevailing supports on voltage scaling for electronic circuits, energy efficiency has become a highly important design issue in hardware and soft-ware implementations [6, 10, 13, 17]. The design of elec-tronic circuitry is usually done such that a higher supply volt-age would result in a higher execution speed (or a higher fre-quency). An example power consumption function [3, 18], as follows, shows the power consumption of a processor as a function of the processor speed:

P (s) = CefVdd2s (1)

wheres = kh((Vdd− Vt)2)/Vdd, andP, s, Cef, Vt, Vdd, and

kh denote the power consumption, the processor speed, the

effective switch capacitance, the threshold voltage, the sup-ply voltage, and a hardware-design-specific constant, respec-tively (Vdd≥ Vt≥ 0, kh> 0, and Cef > 0).

Modern superscalar processors achieve excellent perfor-mance through pipelined instruction executions and concur-rent services of independent instruction streams. Further

∗_{Support in parts by research grants from ROC National Science Council}

(NSC-93-2213-E-002-031).

performance improvement is often done by increasing the processor frequency but at the cost of energy consumption. When energy consumption and system performance must be considered at the same time, the multiprocessor architecture seems being a reasonable choice, especially for multipro-gramming environments (e.g., [16]). The chip-multiprocessor architecture is proposed as an attempt to overcome the chip-space constraint and the processor communication delay problem for the multiprocessor architecture. With a slight increasing on the die size, multiple processors, i.e., cores, are mounted on a single die to retain many advantages of the multiprocessor architecture but with only comparatively low wire delay. While many excellent research results have been proposed for uniprocessor energy-efficient scheduling, e.g., [2, 5, 12, 14, 19, 20], little work has been done for multi-processor systems, e.g., [4, 8, 21], even though the multipro-cessor architecture has become increasingly popular in vari-ous platforms. The strong demand for multiprocessor energy-efficient scheduling is not only from server systems but also from the embedded systems, such as System-on-Chip sys-tems. In particular, power saving on the chip multiprocessor architecture received much attention recently, especially due to its applications to embedded systems. For example, the adaptive chip-multiprocessor (ACMP) architecture proposed in [16] allows each core to switch its operation mode among RUN, STANDBY, and DORMANT in a dynamic manner to reduce the energy consumption.

Real-time task scheduling over a chip multiprocessor with the capability of dynamic voltage scaling (abbreviated as DVS-CMP) is exploited in this paper. The DVS-CMP ar-chitecture explored in this paper hasM homogeneous cores, where each core could be dormant independently, and all non-dormant cores must operate at the same voltage supply. The objective of this research is to schedule a set of frame-based tasks with the minimum energy consumption on a DVS-CMP processor, where all tasks are ready at time 0 and share a com-mon deadline. We show that such a scheduling problem is NP-hard and propose a 2.371-approximation algorithm. The strength of the proposed algorithm is demonstrated by a series

(2)

of simulations, for which we have near optimal results. The rest of this paper is organized as follows: Section 2 presents related work on energy-efficient scheduling. In Sec-tion 3, we formally define the problem under considera-tions. An approximation algorithm was proposed for energy-efficient scheduling in Section 4. Simulation results are shown in Section 5 to evaluate the capability of the proposed algorithm. Section 6 is the conclusion.

2 Related Work

Energy-efficient scheduling has been an active research topic in the past decade. Although many excellent results have been proposed for uniprocessor real-time task schedul-ing, e.g., [2, 5, 12, 14, 19, 20], little work is done for mul-tiprocessor systems so far. However, with the strong rais-ing of the market for various multiprocessor architectures and their variations, multiprocessor energy-efficient schedul-ing has started receivschedul-ing much attention in recent years, e.g., [1, 4, 8, 9, 21, 15, 22]. In particular, Chen, et al. [4] pro-posed an approximation algorithm for multiprocessor energy-efficient scheduling over a set of independent frame-based tasks, where all tasks share the same deadline. Gruian pro-posed a simulated annealing (SA) approach and a list-based heuristic algorithm with a dynamic priority assignment pol-icy for the considerations of precedence constraints [8, 9]. Mishra, et al. [15] explored scheduling issues over the com-munication delay for tasks. Zhu, et al. [22] explored on-line scheduling for a set of independent/dependent frame-based tasks. Given an off-line schedule with the worst-case task ex-ecution times, on-line strategies were proposed to reclaim the slacks resulted from the early completion times of tasks.

Although some research work has been proposed for mul-tiprocessor energy-efficient scheduling, many previous re-sults considered multiprocessor scheduling in which each processor can operate independently at its own processor speed. In this research, we are interested in a DVS-CMP ar-chitecture, in which there areM given homogeneous cores, where each core could be dormant independently, and all non-dormant cores must operate at the same voltage supply. Our objective is to schedule a set of frame-based tasks with the minimum energy consumption on the given DVS-CMP pro-cessor, where all tasks are ready at time 0 and share a com-mon deadline. The work done by Anderson and Baruah [1] is related to our research in this paper. They proposed al-gorithms for the synthesizing of a multiprocessor hard-real-time system with independent periodic tasks and exploited the trade-offs between the number of processors in the sys-tem and the energy consumption. Different from their work, we aim at scheduling for the energy consumption minimiza-tion on a DVS-CMP processor with a fixed number of cores.

3 Problem Definition

In this paper, we exploit energy-efficient scheduling on a chip multiprocessor equipped with M homogeneous cores. The power consumption function in [3, 18], i.e., Formula 1, is adopted in this paper, whereV_t = 0, or V_dd >> V_t. The power consumption function can be rephrased as P (s) = αs3_{, where} _{α is a constant. The available processor speeds} for the DVS-CMP under considerations are assumed being adjustable in a continuous manner, and no upper bound on the processor speed is given (that is as the same as that in [11, 19]). Furthermore, let the overheads on the switching of the supply voltage be negligible. Suppose that any of the cores could be turned into a sleep mode (i.e.,s = 0) at any time, but all of non-sleeping cores must operate at the same processor speed. Let the energy consumed for a core at the processor speeds for t time units be P (s)t, and the execution of an amountc (in cycles on a core) of computation at the processor speeds take c/s time units. The energy-efficient scheduling problem to be explored in this paper could be de-fined as follows:

Definition 1 Energy Consumption Minimization for DVS-CMP Scheduling (ECMS):

Consider a set T of independent tasks on a DVS-CMP, where all tasks in T are ready at time 0 and share a com-mon deadlineD. Let each task τi∈ T be associated with an

amountci(in cycles on a core) of computation requirements.

The objective of this problem is to minimize the energy con-sumption in the scheduling of tasks inT without violating the common deadlineD, where task migration between any two cores is not permitted.

A schedule of an input instance for the ECMS problem is a mapping of the executions of the tasks in the set to cores on the DVS-CMP with speed assignments and their correspond-ing time intervals. A schedule is feasible if no task misses the common deadlineD, and the DVS-CMP constraints are not violated. Let Φ(Ψ) denote the energy consumption of a schedule Ψ. A schedule is optimal if it is feasible and its en-ergy consumption is the minimum enen-ergy consumption of all feasible schedules.

4 The Proposed Algorithm

4.1 Energy Consumption Minimization with Tasks

Being Partitioned

In this subsection, we shall propose a scheduling algorithm in energy consumption minimization when tasks are already partitioned. A task assignment is defined as a partition of T into M disjoint subsets X ≡def {X1, X2, . . . , XM}. A

schedule that is based on X must have tasks in Xi ∈ X

running on thei-th core. An optimal X-based schedule is a schedule that has the minimum energy consumption among

(3)

X-based schedules. The load Xi of eachXi ∈ X is

de-fined as the total amount of the computation requirements of the tasks in Xi, and the load distributionX of X is

de-noted as{X1, X2, . . . , XM}. Without losing the generality,

letXi∈ X be sorted in a non-decreasing order of their loads.

For the simplicity of discussions, letXM+1= ∞ and X0= 0. A schedule Ψ satisfies the deep sleeping property if a coreµ is in the sleep mode at any timetfort < t < D when µ is found in the sleep mode at some time 0≤ t < D.

Lemma 1 For any task assignmentX, there exists an opti-malX-based schedule that satisfies the deep sleeping prop-erty.

Proof. Given a feasible X-based schedule ψ that does not satisfy the deep sleeping property, let the time interval (0, D] of ψ be divided into disjoint time fragments such that the processor speeds of non-sleeping cores are different be-tween any two different consecutive time fragments. (Note that the processor speeds of all non-sleeping cores are the same in each fragment by the definitions of DVS-CMP dis-cussed in this paper). Let time fragments that have the same processor speed be merged together such that (0, D] have k fragments, and each ti of the fragments has a processor

speed si. Without losing of generality, we let 0 < s1 < s2 < · · · < sk. Let yi,j denote the number of core cycles

needed for the j-th core of the i-th fragment, and |ti|

de-note the total duration of the i-th fragment (note that each fragment might consist of non-consecutive time intervals). ˆψ is a schedule derived fromψ as follows: The j-th core exe-cutes tasks at the speedsiin (

i−1 h=1|th|,

i−1

h=1|th|+yi,j/si]

for i = 1, 2, . . . , k and is turned into the sleep mode in (i−1

h=1|th| + yi,j/si,

i

h=1|th|] for i = 1, 2, . . . , k.

Sched-ulesψ and ˆψ have the same energy consumption.

We shall turn the resulted schedule ˆψ into another sched-uleψ that satisfies the deep sleeping property: Let ˆt be the earliest time moment that some coreµj goes into the sleep

mode in ˆψ. Let n be the index which satisfiesn−1_h=1|th| ≤

ˆt<n

h=1|th|. Suppose that µjis non-sleeping at some time

momentt, where ˆt < t < D. Let Y be the total number of cycles executed afterton the coreµj(for all fragments).

TheseY cycles on µjare then executed in the fragments

start-ing fromt_n at their corresponding speeds (t_n+1, etc) from the time moment ˆt until all of the Y cycles are done. The amount of consumed energy is less in the transformation, due to the convexity of the power consumption function. By re-peating the same process for every core, we can always trans-form a feasibleX-based schedule into one that satisfies the deep sleeping property and consumed no more energy than the original one does.

Based on the deep sleeping property, we can derive an op-timal schedule based on a given task assignmentX, as shown in Algorithm 1. Steps 4-7 follow the definition of the power consumption function. The time complexity of Algorithm 1 isO(|T | + M). The optimality is shown as follows.

Algorithm 1 :MES

Input: (X);

Output: A feasibleX-based schedule Ψ with the minimum energy

consumption; 1: Xi← 0 for i = 0 to M; 2: fori = 1 to M do 3: Xi← Xi+ cjfor∀τj∈ Xi; 4: L ← M

i=1(Xi− Xi−1)3 √

M − i + 1 and t0← 0;

5: fori = 1 to M do

6: s_i← √₃ L

M−i+1andti← ti−1+ D

(Xi−Xi−1)√3M−i+1

L ;

7: letΨ turn the i-th core into the sleep mode at tiand set the speed assiin(ti−1, ti] for the non-sleeping cores;

8: returnΨ by executing tasks assigned to each core in an arbitrary order;

Lemma 2 For any given task assignment X, the X-based schedule derived by Algorithm 1 is optimal.

Proof. Let the energy consumption for the schedule de-rived by Algorithm 1 beE∗, whereE∗ = _Dα2(

_M

i=1(Xi −

Xi−1)3

√

M − i + 1)3_. _{We shall show that any} _X-based schedule Ψ (that satisfies the deep sleeping property) con-sumes no less energy thanE∗. Letz₀be the index such that Xj is equal to 0 for allz0 ≥ j ≥ 0. zi is recursively

de-fined as the index such that Xj is equal to Xzi−1+1 for all zi ≥ j > zi−1andi ≥ 1. Besides, let k the index such that

zk= M. yiis defined to beXzi− Xzi−1 for alli ≥ 1. Letβi be the time instant when the (zi−1+ 1)-th core is

turned into sleep (β0 = 0) for Ψ. Due to the deep sleeping property, there are (M − zi−1) cores that are non-sleeping in

(βi−1, βi], and zi−1cores are sleeping in (βi−1, βi]. Note that

Xi’s∈ X are sorted in a non-decreasing order of their loads.

γi is defined asβi− βi−1. Because of the convexity of the

power consumption function, executingyicycles at the same

speed yi

γi forγitime units is the best choice for energy con-sumption. Therefore, Φ(Ψ) ≥ αk_i=1(M − zi−1)(_γyi_i)3γi.

Furthermore, there must exist at least one core that is not in the sleeping mode before D unless there is no load for any core (because the speed lower bound of each core is 0). Let ˆΦ(X) be the energy consumption of an optimal X-based schedule. We have ˆ Φ(X) ≥ min k i=1γi=D α k i=1 (M − zi−1)( y_γi i )3_γ i (2)

By adopting the Lagrange multiplier method, the right-hand side of Equations (2) is minimized when

γi= D · 3 M − zi−1yi k j=1 3 M − zj−1yj .

(4)

We have1 Φ(Ψ) ≥ Φ(X) ≥ˆ α D2( k i=1 yi3 M − zi−1)3 = α D2( M i=1 (Xi− Xi−1)3 M − i + 1)3= E∗≥ ˆΦ(X). (3)

SinceE∗= ˆΦ(X), we reach the conclusion.

LetX be any given load distribution of T . A schedule is based onX if the number of cycles executed on the i-th core is equal toXi. Let the minimum energy consumption among

sched-ules based onX be ˆΦ(X). We have the following corollary: Corollary 1 ˆΦ(X) =_Dα2(

M

i=1(Xi− Xi−1)3 √

M − i + 1)3_.

4.2 A

2.371

-Approximation Algorithm

We shall first show the NP-hardness of the ECMS problem and then propose a 2.371-approximation algorithm.

Corollary 2 The ECMS problem is NP-hard.

Proof. Based on Equation (3) in Lemma 2, Φ(Ψ) is mini-mum if and only ifXj =

τi∈Tci

M forj = 1, 2, . . . , M. This

problem could be reduced from the multiprocessor schedul-ing problem [SS8] in [7], that is NP-complete.

The proposed 2.371-approximation algorithm (Algorithm

LTF), as shown in Algorithm 2, adopts the Largest-Task-First strategy to partitionT into M disjoint sets. Tasks are consid-ered in a non-increasing order of their computation require-ments. For the simplicity of discussions, letT be sorted in a non-increasing order of the computation requirements of tasks (where ties could be broken arbitrarily).

Algorithm 2 :LTF

Input: (T, D, M);

Output: A feasible scheduleΨLT Fwith minimal energy consump-tion;

1: sort all tasks in a non-increasing order of the computation re-quirements of tasks;

2: X_i← φ and X_i← 0 for i = 1 to M; 3: fori = 1 to |T | do

4: find the smallestXm; (break ties arbitrarily)

5: Xm← Xm+ {τi} and Xm← Xm+ ci;

6: reorder Xi by a non-decreasing order of their loads and let XLT F _{← {X}

1, X2, . . . , XM};

7: return the resulted scheduleΨLT F by applying MES(XLT F);

LetT, D, and M denote the task set under discussions, its common deadline, and the number of cores, respectively. Al-gorithmLTFalways assigns a task to the core with the small-est load, where tasks are picked up in a non-increasing order of their computation requirements. The seeking of the core with the smallest load could be done by the manipulation of

1_{The detail proof is omitted due to the space limitation.}

a heap data structure. The time complexity of AlgorithmLTF

isO(|T |(log |T | + log M) + M), which is dominated by the cost for task sorting and heap manipulation.

Given a task setT (with D as the common deadline) and the number M of cores, ΨLT F_, _XLT F_{, and} _XLT F _denote

the schedule, the task assignment, and the load distribution derived by Algorithm LTF, respectively. For the simplicity of discussions, let us renumber cores such that elements in XLT F_{be sorted in a non-decreasing order of their loads. That}

is,XLT F

i ≤ XLT Fi+1 for 1≤ i < M, where XiLT F denotes the

i-th element in XLT F_{, and}_XLT F

i is the load ofXiLT F. For

the abbreviation,XLT F

i is also referred to aspi.

Lemma 3 Given two load distributions X and X for the same task set, ˆΦ(X) < ˆΦ(X) if there exist two indices i and j (j > i) such that Xk = Xk for k = i, j, and

0 < Xi− Xi< min{Xi− Xi−1, Xj+1− Xj}.

Proof. Based on Corollary 1, ˆΦ(X) =

α D2( M k=1(Xk − Xk−1)3 √ M − k + 1)3 _and _ˆΦ(X₎ ₌ α D2( M k=1(Xk − Xk−1)√3M − k + 1)3, respectively. ˆΦ(X) < ˆΦ(X_{) because} M k=1 (Xk− Xk−1)3 M − k + 1 − M k=1 (X k− Xk−1)3 M − k + 1 = k=i,j ((Xk− Xk)3 M − k + 1 − (Xk− Xk)3 M − k) = (Xi− Xi)[(3 M − i + 1 − 3 M − i) − (3 M − j + 1 − 3 M − j)] < 0.

Whenp₁ = 0, AlgorithmLTFalways generates an optimal schedule because no core is associated with more than one task. For the rest of this section, suppose thatp₁= 0. We first derive an upper bound on Φ(ΨLT F_{) for any schedule Ψ}LT F

derived by AlgorithmLTFand then a lower bound on the opti-mal energy consumption forT (regardless of which algorithm is adopted). Let ˆmk = |{i | pi≤ k · p1}|, and Pk =

_m_ˆ_k

i=1pi

for some realk ≥ 1. XLT F_{(k) = {ˆp}

1, ˆp2, . . . , ˆpM} is revised

based onXLT F _{by load redistribution as follows:}

1. ˆpi← piifi > ˆmk; 2. ˆpi← p1ifk· ˆpm1k(k−1)p1−Pk ≥ i ≥ 1; 3. ˆpi← k · p1if ˆmk≥ i ≥ k· ˆ_pm₁k_(k−1)p1−Pk + 1; 4. ˆpi ← Pk − p1(k· ˆpm1k(k−1)p1−Pk − 1) − k · p1( ˆmk − k· ˆmkp1−Pk p1(k−1) ) if i = k· ˆmkp1−Pk p1(k−1) .

Lemma 4 The minimum energy consumption to schedule tasks based onXLT F_{(k) is no less than that based on X}LT F

for any realk ≥ 1.

Proof. Initially, let X = XLT F. We could repeat the fol-lowing revision procedure ofX until i ≥ j (where i and j are the smallest and the largest indices which satisfyXi> p1and

(5)

Xj < kp1, respectively): Xi ← Xi− δ and Xj ← Xj + δ,

whereδ ← min{Xi − p1, kp1− Xj}. The final load

dis-tributionX would be as the same as XLT F_{(k) after a finite}

number of the above procedure applied. Based on Lemma 3, ˆ

Φ(XLT F_{(k)) ≥ ˆΦ(X}LT F_).

For the simplicity of representation, we denote k· ˆmkp1−Pk

p1(k−1)

asχ. We have the following inequality:

ˆΦ(XLT F_{(k)) ≤} α D2[ 3 √ M ˆp1+ 3 M − χ(k − 1)ˆp1 + M i= ˆmk+1 (ˆpi− ˆpi−1)√3_{M − i + 1]}3_{, (4)}

where the inequality comes from the following inequal-ity: 3 _{M − χ(ˆp} χ− ˆp1) + 3 M − χ (ˆpmˆk − ˆpχ) ≤ 3 √ M − χ(k − 1)ˆp1. LetT = {τi | ci≥ |T | j=i+1cj

M−i }. We shall show that one

core will be selected to service only one task inT, regardless of whetherXLT F _{or an optimal task assignment} _XOP T _is

considered. When a taskτ_j /∈ T is considered in Algorithm

LTF(i.e., Steps 3-5), there must exist a corem∗ whose load

pm∗ is no more thanc_ifor anyτ_i ∈ T. Therefore, no other task will be assigned to any core occupied by anyτi ∈ T

by AlgorithmLTF. Consider any task assignmentX derived by some algorithm. If some task τi ∈ T and some other

taskτjare assigned on the same core whereci≥ cj, another

task assignmentXcould always be generated by movingτj

to another corem whereXm < c_i. The optimalX-based schedule consumes less energy than that based onX. Thus X must not be the optimal task assignment. Therefore, one core will be selected to service only one task inTfor an optimal task assignmentXOP T. Let XOP T = {q1, q2, . . . , qM} be

the load distribution for an optimal solution. Note that ˆmk =

|{i | pi≤ k · p1}|. We could prove the following lemma.

Lemma 5 Ifk = 2, then qmˆk+i = pmˆk+i, for all 1 ≤ i ≤ M − ˆmk.

Proof. Letj be the largest index for a core to which a task τn ∈ T − T is assigned, andτn ∈ XjLT F. Let the last

task inserted into XLT F

j be τr. Sincecn <

|T |

j=|T |+1cj

M−|T_| , |XLT F

j | ≥ 2. It is clear that cr ≤ p1 andpj − cr ≤ p1. Therefore,pj ≤ 2p1. SinceXLT Fh > 2p1for anyh > ˆm2,

we havej ≤ ˆm₂. Furthermore, we know thatqj+i = pj+i,

for all 1≤ i ≤ M − j. Note that Pk =

m_ˆk

i=1pi, where ˆmk is defined by

XLT F_{(k). Similar to the definition of X}LT F_{(k), we}

de-fine ¯XOP T(k) as an adjusted load distribution according to XOP T _{by re-distributing}mˆk i=1qi so that ˆq1 = ˆq2 = · · · = ˆ qmˆk = mkˆ i=1qi ˆ

mk . Similar to the proof in Lemma 4, we have ˆ

Φ(XOP T_{) ≥ ˆΦ(¯X}OP T_{(k)) for any k ≥ 1. We conclude this}

section by showing the following theorem.

Theorem 1 AlgorithmLTFhas a 2.371-approximation ratio

for the ECMS problem.

Proof. The approximation ratioALT F is: ALT F =Φ(XLT F )ˆ_ˆ Φ(XOP T )≤ ˆ Φ(XLT F (2)) ˆ Φ(¯XOP T (2)) ≤ (( 3 √ M + 3√M − χ) ˆp1 + M i= ˆm2+1(ˆpi − ˆpi−1) 3√M − i + 1)3 ( 3√M ˆq1 + 3 M − ˆm2(2ˆp1 − ˆq1) + M i= ˆm2+1( ˆpi − ˆpi−1) 3√M − i + 1)3 ≤ (( 3 √ M + 3√M − χ) ˆp1)3 ( 3√M ˆq1 + 3 M − ˆm2(2ˆp1 − ˆq1))3 ≤ ( ( 3√M + 3 L−M ˆp1 ˆ p1 ) ˆp1 3 √ M L_M )3, (5) where L is ˆq₁mˆ₂ + 2ˆp₁(M − ˆm₂). Note that ˆq₁mˆ₂ = m_ˆ2

i=1qi= P2. We haveL = χˆp1+2ˆp1(M −χ). Let f(x) be defined asf(x) = (3

√

M +√3 K−Mx_x )x

3

√

MK_M for any rational number

K where x · a + 2x · (M − a) = K for some non-negative rational numbera. By solving the equation f(x) = 0, where f_{(x) < 0, we have}

f(x) ≤4_{3 ,} (6)

where the maximal value stands whenx = _9M8K. According to Equations (5) and (6), we haveALT F ≤ (4₃)3< 2.371.

5 Simulation Results

The purpose of this section is to provide performance eval-uation of Algorithm LTF. AlgorithmRAND was also

simu-lated for reference, where the AlgorithmRAND greedily as-signed a task to any core with the minimum load without sort-ing tasks. The relative energy consumption ratio, which was defined as _Φ(ΨΦ(ΨLT FOP T)₎, was adopted as the performance metric, where ΨOP T is an optimal schedule for the ECMS problem. ΨOP T_{can be obtained via an exhaustive search with a branch}

and bound strategy. When|T | was a large number, the re-laxed relative energy consumption ratio, which was defined as _ˆΦ(ΨLT F)

Φ(¯XOP T₍₂₎₎ (please refer to Section 4 for the definition of ¯

XOP T_{(k)) was adopted as the performance metric. By}

defini-tions, ˆΦ(¯XOP T(2)) can be obtained in an efficient manner2. D was set as any arbitrary positive rational number in the simulations. The amount of cyclesci for a taskτi was

gen-erated randomly in the range (0, D]. The power consumption functionP (s) was s3. 100 independent simulations were run for each parameter configuration. When the results were for the average relative energy consumption ratio, their results were averaged. When they were for the maximum relative energy consumption ratio, the maximum value was returned. Figure 1(a) and (b) show the average and maximum relative energy consumption ratios for the simulated algorithms, when the number of cores ranged from 3 to 8, and the task set size ranged from 10 to 15. Figure 1(c) and (d) show the average and maximum relaxed relative energy consumption ratios for the simulated algorithms, when the number of cores ranged from 8 to 32, and the task set size ranged from 50 to 100. The

2_{Since the problem is NP-hard, the performance metric relaxed relative} energy consumption ratio aimed at the providing of an approximate index

(6)

10 11 12 13 14 15 3 4 5 6 7 8 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 Average relative energy consumption ratioRANDLTF

Number of Tasks Number of Cores

Average relative energy consumption ratio

10 11 12 13 14 15 3 4 5 6 7 8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Maximum relative energy consumption ratioRANDLTF

Maximum relative energy consumption ratio

50 60 70 80 90 100 5 10 15 20 25 30 35 1 1.2 1.4 1.6 1.8 2 2.2

Average relaxed relative energy consumption ratioRANDLTF

Average relaxed relative energy consumption ratio

50 60 70 80 90 100 5 10 15 20 25 30 35 1 1.5 2 2.5 3 3.5

Maximum relaxed relative energy consumption ratioRANDLTF

Maximum relaxed relative energy consumption ratio

(a) (b) (c) (d)

Figure 1.The simulation results of AlgorithmLTFand AlgorithmRAND: (a) The average relative energy consumption ratio when|T | = 10...15 and M = 3...8 (b) The maximum relative energy consumption ratio when|T | = 10...15 and M = 3...8 (c) The average relaxed relative energy consumption ratio when |T | = 50...100 and

M = 8...32 (d) The maximum relaxed relative energy consumption ratio when |T | = 50...100 and M = 8...32

maximum and average relative energy consumption ratios for AlgorithmLTFwere less than 1.36 and 1.07 respectively. Fur-thermore, the maximum and average relaxed relative energy consumption ratios for AlgorithmLTFwere less than 2.00 and 1.44, respectively.

6 Conclusion

In this paper, we explore real-time energy-efficient scheduling on a chip multiprocessor with dynamic voltage scaling. We consider frame-based task sets, in which all tasks are ready at time 0 and share a common deadline. When a task partition is given, we present an optimal scheduling algorithm for the minimization of energy consumption. When task par-titioning and scheduling must be resolved, we first prove the NP-hardness of the problem and then propose a 2 .371-approximation algorithm withO(|T |(log |T |+log M)+M), whereT is a given task set, and M is the number of cores for a chip multiprocessor. A series of simulations was conducted the strength of our proposed algorithm, for which we have very encouraging results.

References

[1] J. H. Anderson and S. K. Baruah. Energy-efficient synthesis of pe-riodic task systems upon identical multiprocessor platforms. In

Pro-ceedings of the 24th International Conference on Distributed Comput-ing Systems, pages 428–435, 2004.

[2] H. Aydin, R. Melhem, D. Moss´e, and P. Mej´ıa-Alvarez. Dynamic and aggressive scheduling techniques for power-aware real-time sys-tems. In Proceedings of the 22nd IEEE Real-Time Systems

Sympo-sium, pages 95–105, 2001.

[3] A. Chandrakasan, S. Sheng, and R. Broderson. Lower-power CMOS digital design. IEEE Journal of of Solid-State Circuit, 27(4):473–484, 1992.

[4] J.-J. Chen, H.-R. Hsu, K.-H. Chuang, C.-L. Yang, A.-C. Pang, and T.-W. Kuo. Multiprocessor energy-efficient scheduling with task migra-tion consideramigra-tions. In Proceedings of the 16th Euromicro Conference

on Real-Time Systems, pages 101–108, 2004.

[5] J.-J. Chen, T.-W. Kuo, and C.-L. Yang. Profit-driven uniprocessor scheduling with energy and timing constraints. In ACM Symposium

on Applied Computing, pages 834–840. ACM Press, 2004.

[6] J. Y. Chen, W. B. Jone, J. S. Wang, H.-I. Lu, and T. F. Chen. Seg-mented bus design for low-power systems. IEEE Transactions on

VLSI Systems, 7(1):25–29, 1999.

[7] M. R. Garey and D. S. Johnson. Computers and intractability: A guide

to the theory of NP-completeness. W.H. Freeman and Co, 1979.

[8] F. Gruian. System-level design methods for low-energy architectures containing variable voltage processors. In Power-Aware Computing

Systems, pages 1–12, 2000.

[9] F. Gruian and K. Kuchcinski. Lenes: Task scheduling for low energy systems using variable supply voltage processors. In Proc. Asia South

Pacific Design Automation Conference, pages 449–455, 2001.

[10] V. Gutnik and A. P. Chandrakasan. Embedded power supply for low-power DSP. IEEE Transactions on VLSI Systems, 5(4):425–435, 1997. [11] S. Irani, S. Shukla, and R. Gupta. Algorithms for power savings. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on

Discrete Algorithms, pages 37–46. Society for Industrial and Applied

Mathematics, 2003.

[12] T. Ishihara and H. Yasuura. Voltage scheduling problems for dynami-cally variable voltage processors. In Proceedings of the 1998

interna-tional symposium on Low power electronics and design, pages 197–

202, 1998.

[13] W.-B. Jone, J. S. Wang, H.-I. Lu, I. P. Hsu, and J.-Y. Chen. Design the-ory and implementation for low-power segmented bus systems. ACM

Transactions on Design Automation of Electronic Systems, 8(1):38–

54, 2003.

[14] P. Mej´ıa-Alvarez, E. Levner, and D. Moss´e. Adaptive scheduling server for power-aware real-time tasks. ACM Transactions on

Em-bedded Computing Systems, 3(2):284–306, 2004.

[15] R. Mishra, N. Rastogi, D. Zhu, D. Mosse, and R. Melhem. Energy aware scheduling for distributed real-time systems. In International

Parallel and Distributed Processing Symposium, page 21, 2003.

[16] M. Nikitovic and M. Brorsson. An adaptive chip-multiprocessor ar-chitecture for future mobile terminals. In International Conference on

Compilers, Architecture, and Synthesis for Embedded Systems, pages

43–49, 2002.

[17] M. Pedram and J. M. Rabaey. Power Aware Design Methodologies. Kluwer Academic Publishers, 2002.

[18] M. Weiser, B. Welch, A. Demers, and S. Shenker. Scheduling for reduced CPU energy. In Proceedings of Symposium on Operating

Systems Design and Implementation, pages 13–23, 1994.

[19] F. Yao, A. Demers, and S. Shankar. A scheduling model for reduced CPU energy. In Proceedings of the 36th Annual Symposium on

Foun-dations of Computer Science, pages 374–382. IEEE, 1995.

[20] H.-S. Yun and J. Kim. On energy-optimal voltage scheduling for fixed-priority hard real-time systems. ACM Transactions on

Embed-ded Computing Systems, 2(3):393–430, Aug. 2003.

[21] Y. Zhang, X. Hu, and D. Z. Chen. Task scheduling and voltage selec-tion for energy minimizaselec-tion. In Annual ACM IEEE Design

Automa-tion Conference, pages 183–188, 2002.

[22] D. Zhu, R. Melhem, and B. Childers. Scheduling with dynamic volt-age/speed adjustment using slack reclamation in multi-processor real-time systems. In Proceedings of IEEE 22th Real-Time System