Multiprocessor energy-efficient scheduling with task migration considerations

(1)

Multiprocessor Energy-Eﬃcient Scheduling

with Task Migration Considerations

∗

Jian-Jia Chen, Heng-Ruey Hsu, Kai-Hsiang Chuang,

Chia-Lin Yang, Ai-Chun Pang, and Tei-Wei Kuo

Department of Computer Science and Information

Engineering National Taiwan University, Taipei, Taiwan 106, ROC.

E-Mails:

{r90079, b89108, b89109, yangc, acpang, ktw}@csie.ntu.edu.tw

Abstract

This paper targets energy-efficient scheduling of tasks over multiple processors, where tasks share a com-mon deadline. Distinct from many research results on heuristics-based energy-efficient scheduling, we pro-pose approximation algorithms with different approxi-mation bounds for processors with/without constraints on the maximum processor speed, where no task mi-gration is allowed. When there is no constraint on processor speeds, we propose an approximation algo-rithm for two-processor scheduling to provide trade-offs among the specified error, the running time, the approx-imation ratio, and the memory space complexity. An approximation algorithm with a 1.13-approximation ra-tio for M-processor systems is also derived (M > 2). When there is an upper bound on processor speeds, an artificial-bound approach is taken to minimize the en-ergy consumption with a 1.13-approximation ratio. An optimal scheduling algorithm is then proposed in the min-imization of the energy consumption when task migra-tion is allowed.

Keywords: Energy-Eﬃcient Scheduling, Real-Time Task Scheduling, Power Management, Real-Real-Time Systems, Multiprocessor Scheduling.

1. Introduction

While an energy-eﬃcient design has become a focus on various systems, voltage-scaling CPU’s and power-aware subsystems are now adopted in many modern computer systems. The design of CPU circuitry is usu-ally done such that a higher supply voltage results in

∗ Support in parts by research grants from ROC National Sci-ence Council (091, NSC-92-2213-E-002-092, and NSC-92-2220-E-002-013).

a higher execution speed (or higher frequency). An ex-ample energy consumption function [1, 14], as follows, shows the energy consumption of a processor as a func-tion of the processor speed:

P (s) = CefVdd2s, (1) where s = k(Vdd−Vt)2

Vdd , and P, s, Cef, Vt, Vdd, and k denote the energy consumption, the processor speed, the effective switch capacitance, the threshold voltage, the supply voltage, and a hardware-design-specific con-stant, respectively (V_dd≥ V_t≥ 0, k > 0, and C_ef > 0). The energy consumption function of a processor is usu-ally a convex function of the processor speed, and each specific function is highly dependent on the design of the corresponding processor.1

Energy-efficient scheduling has been an active re-search topic in the past decade. In particular, Yao, et al. [15] proposed an off-line scheduling algorithm and an on-line competitive algorithm to minimize the en-ergy consumption of task executions in a uniprocessor environment, where the processor under considerations has an infinite number of continuous processor speeds. In [10], Ishihara and Yasuura showed that an optimal schedule in the minimization of energy consumption with only two processor speeds when the processor has only a finite number of discrete processor speeds, and all tasks are ready at time 0 and have a common dead-line. Note that the results could only be applied to pro-cessors with an energy consumption function equal to Formula (1). While an energy consumption function could be any convex function, Chen, et al. [3] showed that the result in [10] remains.

Although many excellent results were proposed for uniprocessor energy-eﬃcient scheduling, lit-tle work has been done for multiprocessor

envi-1 f(x) is a convex function if f(αx + (1 − α)y) ≤ αf(x) + (1 − α)f(y) for any α ∈ (0, 1) and any x, y [5].

(2)

ronments. In recent years, energy-efficient design has been outlined as a critical issue by the indus-try in business operations, e.g., [2], where various configurations of server farms are adopted. Unfor-tunately, multiprocessor energy-efficient scheduling is often NP-hard under various application con-straints. Gruian [7] proposed a simulated anneal-ing (SA) approach in multiprocessor energy-efficient scheduling with the considerations of precedence con-straints and a predictable execution time for each task. In [8], a power-aware scheduling algorithm based on a list heuristics with a dynamic priority assign-ment was proposed to determine the amount of time allocated to each task. Zhang, et al. [16] proposed a heuristic algorithm in which each task was first as-signed to a proper processor, and the processor speed in executing each task was then chosen without vio-lating the precedence and timing constraints. Mishra, et al. [12] explored scheduling issues on the com-munication delay of tasks. Zhu, et al. [17] explored on-line scheduling for a set of independent/dependent frame-based tasks, where all tasks in a frame-based task set are ready at time 0 and share a common dead-line. Given an off-line schedule with worst-case task execution times, on-line strategies were pro-posed to reclaim the slacks resulted from the early completion times of tasks observed in the run time. Al-though some work has been done on multiproces-sor energy-efficient scheduling, many previous results are mainly on heuristics-based energy-efficient schedul-ing.

Distinct from the past work, the objective of this pa-per is to propose approximation algorithms with differ-ent approximation bounds for processors with/without constraints on the maximum processor speed, where task migration is considered. We first show that there does not exist any polynomitime approximation al-gorithm with an approximation bound (1 +) in the minimization of energy consumption for multiproces-sor scheduling over procesmultiproces-sors with an upper bound on the processor speed, where could be any positive real. When there is no constraint on processor speeds, we propose an approximation algorithm for two-processor scheduling to provide trade-offs among the specified error, the running time, the approximation ratio, and the memory space complexity. An approximation algo-rithm with a 1.13-approximation ratio for M-processor systems is also derived (M > 2). When there is an up-per bound on processor speeds, an artificial-bound ap-proach is taken to minimize the energy consumption with a 1.13-approximation ratio.2An optimal

schedul-2 Such a constraint violation study was ﬁrst introduced in [11].

ing algorithm is then proposed in the minimization of the energy consumption when task migration is al-lowed.

The rest of this paper is organized as fol-lows: Section 2 formally defines the multiproces-sor energy-efficient scheduling problems and show their hardness. Section 3 presents approximation al-gorithms for multiprocessor energy-efficient schedul-ing for the two-processor case and general cases, where no task migration is allowed. In Section 4, an op-timal polynomial-time scheduling algorithm is pro-posed in the minimization of the energy consumption when task migration is allowed. Section 5 is the con-clusion.

2. Problem

Deﬁnitions

and

NP-Hardness

2.1. Problem Deﬁnitions

This paper is interested in multiprocessor energy-eﬃcient scheduling with/without constraints on the maximum processor speed and with task migration considerations. We assume a homogeneous multipro-cessor environment, where each of the M identical processors has the same energy consumption function P (s) of a given processor speed s. In this paper, P (s) is assumed being a convex and increasing function. An example energy consumption function in [1, 14] is P (s) = Cefs((_ks + 2Vt) 2Vt+_ks+ q 4Vts k +k2s2 2 − Vt2), which is a reformulation of Formula (1). LetU_maxdenote the maximum available processor speed for processors un-der consiun-derations such that tasks could be executed at any processor speed in [0, U_max]. When V_t = 0, P (s) = αs3_{, where} _{α =} Cef

k2 . It is reasonable to

con-sider only cases for a given energy consumption func-tionP (s) where P (s₁)> P (s₂) fors₁> s₂. Let the en-ergy consumed for a processor in the execution of tasks at the processor speeds for t time units be P (s)t. We assume that the number of CPU cycles executed in a time interval is linearly proportional to the proces-sor speed. We denote the amount of required CPU cy-cles for a task running at a speeds for t time units is the multiplication ofs and t.

Task migration might or might not be allowed in the exploring of energy-efficient scheduling in this pa-per. When task migration is allowed, migration cost is assumed being negligible. No task could execute simul-taneously on more than one processors. For the rest of Section 2, we first formally define the multiproces-sor scheduling problems with the minimization of

(3)

en-ergy consumption with/without task migration in this paper. We then show the NP-hardness of the problems.

Deﬁnition 1 Multiprocessor Scheduling with the

Min-imization of Energy Consumption with Task Migration (MMEM):

Consider a setT of independent tasks over M identi-cal processors with an energy consumption functionP (s), where all tasks inT are ready at time 0 and share a com-mon deadlineD. Each task τ_i ∈ T is associated with a computation requirement equal toc_i CPU-cycles. The problem is to minimize the energy consumption in the scheduling of tasks in T without missing the common deadlineD, where task migration is allowed.

A variation of the MMEM problem without task migration could be deﬁned similarly as follows:

Deﬁnition 2 Multiprocessor Scheduling with the

Min-imization of Energy Consumption without Task Migra-tion (MME):

The input and output of theMME problem are as the same as their counterparts of theMMEMproblem, where no task migration is allowed.

A schedule of a task set is a mapping of the execu-tions of the tasks in the set to processors in the system with an assignment of processor speeds for the corre-sponding time intervals of the tasks. A schedule is feasi-ble if all processor speeds assigned for its time intervals are valid, no task misses the deadlineD, and the given task migration constraint is satisfied. The energy con-sumption of a scheduleSC is denoted as Φ(SC) (Please see the first paragraph of this section for the definition of the energy consumption). A schedule is optimal if it is feasible, and its energy consumption is equal to the minimum energy consumption of all feasible sched-ules. If there does not exist any feasible schedule for an input instance, then the minimum energy consump-tion is denoted as∞.

2.2. Hardness of the

MME Problem

We shall show the NP-hardness of theMME prob-lem in this section and then propose an optimal algo-rithm for theMMEM problem in a later section:

Lemma 1 (Chen, Kuo, and Yang [3]) There ex-ists an optimal schedule for any task setT executing on a single processor at the single processor speed

P

τi∈Tci

D ,

where the processor under considerations has an in-ﬁnite number of continuous processor speeds, and all tasks inT are ready at time 0 and have a common dead-lineD.

Although multiprocessor scheduling is NP-hard [4] when no task migration is allowed, this does not imply

the NP-hardness of the MME problem directly . For example, whenP () is a linear function(P (s) ∝ s), any feasible schedule is an optimal solution.

Theorem 1 TheMME problem is NP-hard when M ≥

2.

Proof: The NP-hardness is proved by a reduction from the 3-PARTITION problem [4], where P () is a strict convex and increasing function, follows from Lemma 1.3

A polynomial-time (1 +)-approximation algorithm must have a polynomial-time complexity of the input size and derive a solution with a bound (1 +) on the given objective function [13]. That is, whenE(OP T ) represents the value of an optimal solution for the ob-jective function, any solution derived from a (1 + )-approximation algorithm should have a value of the objective function no more than (1 +)E(OP T ) (for minimization problems).

Theorem 2 There does not exist a polynomial-time

(1 +)−approximation algorithm for the MME problem ( > 0) when U_max= ∞, unless P = NP .

Proof: This theorem can be proved by contradic-tion: Suppose that there exists a polynomial-time (1 +)−approximation algorithm for the MME prob-lem, called ALG. Given an instance of the PARTI-TION problem [4] (which is NP-complete), the prob-lem is to ﬁnd a subset A in a given set A such that

ai∈Aw(ai) =

ai∈A−Aw(ai), where each ele-menta_i inA is associated with a size w(a_i)∈ Z+. Let Umax be an arbitrarily positive real and Umax = ∞. The instance of the PARTITION problem could be re-duced to an instance of theMME problem such that a unique task τ_i is created for each element a_i ∈ A, and the required CPU cycles for τ_i is w(a_i)· U_max. All tasks are ready at time 0, and the common dead-line is set as D =

P

ai∈Aw(ai)

2 . Let the number M of

processors be 2. By applying the approximation al-gorithm ALG to the resulting instance of the MME problem, the energy consumption of the derived sched-ule would be bounded by the multiplication of (1 +) and the energy consumption of an optimal sched-ule if there exists would be any feasible schedsched-ule. How-ever, any feasible schedule could not execute tasks at a speed overU_max. In other words, if there exists a fea-sible schedule, thenALG must already identify a sub-setAofA such that_a

i∈Aw(ai) =

ai∈A−Aw(ai). If there does not exist a feasible schedule, then ALG would report the failure by returning∞. Since ALG is

3 _{f() is strict convex if f(αx + (1 − α)y) < αf(x) + (1 − α)f(y)} for anyα ∈ (0, 1) and any x, y. For example, P (s) ∝ s3when s ≥ 0.

(4)

a polynomial-time algorithm, such a conclusion contra-dicts with the NP-Completeness of the PARTITION problem (unless P=NP).

Theorem 2 implies that there does not ex-ist any polynomial-time approximation algorithm for theMME problem when U_max = ∞ unless P = NP , since could be any positive real.

3. Multiprocessor

Scheduling

with-out Task Migration

In this section, we present approximation algorithms for the MME problem. We ﬁrst consider the case in whichU_max =∞ for two and an arbitrary number of processors, respectively. We then show the proposed al-gorithms can be proved to bound the maximum pro-cessor speed by constant factors, when U_max = ∞. Note that, since all tasks are ready at time 0 and share a common deadline, the tasks assigned on a processor can be executed in any order. That is, the execution or-der for the tasks assigned on a processor does not aﬀect the feasibility and the energy consumption for any fea-sible schedule. The following formula resulted from the convexity of the energy consumption function is used in this section: P (c1 D)D + P ( c 2 D)D ≥ P ( c 3 D)D + P ( c 4 D)D, (2) whenc₁+c₂=c₃+c₄ and 0≤ c₁< c₃< c₄< c₂. Based on Formula (2) and Lemma 1, it is clear that the executing of a taskτ_i on the processori from time 0 toD at the speed ci

D results in an optimal schedule, when |T | ≤ M. In the following of this section, only non-trivial cases,|T | > M, are considered.

3.1. Multiprocessor Scheduling over Two

Identical Processors When

U

_max

=

∞

We shall show how to obtain a fully polynomial time approximation scheme (FPTAS) for two identi-cal processors by a reduction to theMaximum Subset Sum problem [4].4 _{Given a set} _{A of positive numbers}

a1, a2, · · · , a|A|and an arbitrary numberW , the Max-imum Subset Sum problem [4] (which is NP-hard) is to ﬁnd a subsetA ofA such that_a

i∈Aai≤ W and

ai∈Aai is maximized.

Lemma 2 (Ibarra and Kim [9]) The Maximum

Subset Sum problem admits a fully polynomial-time

4 An algorithmA for a minimization problem is an FPTAS if A is executed in polynomial time in the size of the input and1, and the approximation ratio of algorithm_{A is 1 + [13], where} 0< is a user input parameter. Note that the approximation ratio is₁₋1 for a maximization problem, where 0_{< < 1.}

1

(1−δ)-approximation algorithm subset () for any

0< δ < 1, where the time complexity is O(|A|(3_δ)2) and the space complexity isO(|A| + (3_δ)3).

Due to Lemma 1 and non-migration of tasks, there must exist an optimal schedule for theMME problem with two processors which assigns two subsets T1and T2₍_T1_{∪ T}2₌_{T ) of tasks on the processors 1 and 2 at}

the speeds P τi∈T 1ci D and P τi∈T 2ci D , respectively. With-out loss of generality, let _τ

i∈T1ci ≤

τi∈T2ci. Be-cause of the convexity of the energy consumption func-tions in Formula (2), achieving the optimal schedule for the MME problem is to generate a subset T1 of T such that_τ_i_∈T1ci ≤ 1₂

τi∈Tci and

τi∈T1ci is maximized. We develop Algorithmbasic for the MME problem with two processors by applying the subset routine in Lemma 2 with a properδ. The input param-eter in Algorithm basic is a speciﬁed amount of error tolerant to users, which is a necessary requirement for an FPTAS. It is obvious that the correctness of Algo-rithmbasic is guaranteed. In the following theorems, we show that setting δ =

2 leads Algorithmbasic

to be a fully polynomial time (1 +)-approximation al-gorithm for the MME problem when the energy con-sumption function satisﬁes Formula (1).

Algorithm 1 :basic

Input: (T, D, );

Output: A feasible schedule SC with minimal energy

con-sumption; 1: letW = P τi∈Tci 2 ; 2: C =subset`c1, c2, · · · , c|T |, W, δ´withδ = p /2; let T1_{be the corresponding task set of}_C_.

3: output the scheduleSC which executes all tasks in T1at the speed

P

τi∈T 1ci

D on the processor 1 and all tasks in T − T1_{at the speed}Pτi∈T −T 1ci

D on the processor 2;

Lemma 3 f(x, γ) = (γx)_x33_+(1−x)+(1−γx)3 3 ≤ 2γ2− 4γ + 3 for

any ﬁxedγ, where 1 ≥ γ > 0 and 1₂ ≥ x > 0.

Proof: It is solved when ∂f (x,γ)_∂x = 0.

Theorem 3 Algorithmbasic is a (1+)-approximation

algorithm for theMME problem for any 0 < < 2 when P (s) ∝ s3_{, i.e.,}_V

t= 0 in Formula(1).

Proof: Let OP T denote a subset of T , where

τi∈OP Tci ≤ W and

τi∈OP Tci is maximized. For the simplicity of representation, we use C(X) to de-note _τ

i∈Xci for any subset X of tasks. Let SCopt be the schedule which executes the tasks in OP T at the speed C(OP T )_D on the processor 1 and the tasks

(5)

in T − OP T at the speed C(T −OP T )

D on the proces-sor 2. SC_opt is an optimal solution for the MME problem and

Φ(SC_opt) = (P (C(OP T ) D ) +P (

C(T ) − C(OP T )

D ))D.

Without loss of generality, let C(OP T ) = C(T ) · x andC(T − OP T ) = C(T ) · (1 − x). Since C(OP T ) ≤ C(T − OP T ), we have 0 < x ≤ 1

2. We knowC(T1) =

γ · C(OP T ), where 1 ≥ γ ≥ 1 − δ due to the approxi-mation ratio of Algorithmsubset. The ratio of the en-ergy consumption ofSC to that of SC_optis deﬁned as a functionf():

f(x, γ) = Φ(SC) Φ(SC_opt) =

(γx)3+ (1− γx)3

x3_{+ (1}_{− x)}3 ≤ 2γ2−4γ+3,

where the inequality comes from Lemma 3. Note that both γ and x are unknown during the calculation. f(x, 1 −

2)≤ 1 + by solving 1 + = 2γ2− 4γ + 3.

Since 2γ2− 4γ + 3 is a decreasing function of γ for any 0< γ ≤ 1, f(x, γ) ≤ 1 + if δ =

2.

Therefore, by setting δ =

2, we conclude that

Algorithmbasic is a (1 + )-approximation algorithm for the MME problem. The time complexity of Algo-rithmbasic is O(|T |18

), and the space complexity is O(|T | + (18

)1.5).

We can also prove that Algorithmbasic is an FP-TAS even whenV_t= 0 in Formula (1) in the following theorem.

Theorem 4 Algorithmbasic is a (1+)-approximation

algorithm for the MME problem for any 0 < < 2 whenP (s) = C_ef(_2ks3₂ +2Vts2 k +sVt2+s 2 k Vts k + s 2 4k2 + Vts Vts 4k +s 2 k2), i.e.,Vt= 0 in Formula (1).

3.2. Multiprocessor Scheduling over an

Ar-bitrary Number of Processors When

U

max

=

∞

In this section, we present a scheduling algorithm with a 1.13-approximation ratio for the MME prob-lem when the maximum available processor speed is inﬁnite. Our proposed algorithm shown in Algorithm 2 (Algorithmltf) adopts the Largest-Task-First strat-egy. That is, tasks are considered in a non-increasing order of their computation requirements.

Letp_mdenote the load on the processorm. The load of a processor is deﬁned as the total amount of the com-putation requirements of the tasks assigned to that pro-cessor. LetT_mdenote the set of the tasks assigned to the processorm. Note that the task set T is a sorted set in a non-increasing order of the computation require-ment of each task, i.e.,c_i≥ c_j ifi < j. Algorithm ltf

assigns a task to the processor with the smallest load by the task order inT . To achieve the minimal energy consumption, based on Lemma 1, each task on the pro-cessorm should be executed at the speed

P

τi∈Tmci

D .

The time complexity of Algorithmltf is O(|T | log |T |), which is dominated by the sorting of the tasks. Since each task is assigned to one processor without miss-ing the common deadline, the correctness of Algorithm ltf is guaranteed. For the simplicity of representation, the schedule derived from Algorithmltf is denoted as SCT,LT F.

Algorithm 2 :ltf

Input: (T, D, M);

Output: A feasible schedule SCT,LT Fwith minimal energy consumption;

1: sort all tasks in a non-increasing order of the computa-tion requirement of each task;

2: setp1, p2, · · · , pMto 0, andT1, T2, · · · , TMtoφ;

3: for i = 1 to |T | do

4: ﬁnd the smallestp_m; (break ties arbitrarily)

5: Tm← Tm∪ {τi} and pm← pm+ci;

6: return the scheduleSC_{T,LT F} which executes all of the tasks inTm(1≤ m ≤ M) at the speedpm

D on the proces-sorm;

Lemma 4 Algorithm ltf is an optimal algorithm if

|T | ≤ 2M and ci+M≥ 1₂cM−i+1for all 1≤ i ≤ |T | − M.

Proof: It can be proved by transforming any feasible

solution into SC_{T,LT F} without increasing the energy consumption.

The next step is to derive the lower bound of the MME problem by relaxing the problem constraint. Let k be the largest index satisfying M ≤ k ≤ 2M and ci+M ≥ 1₂cM−i+1 for all 1≤ i ≤ k − M. T represents the set of the ﬁrstk tasks of T . Note that, if |T| < 2M andT −T = φ, we know c_|T_|+1< 1₂c_2M−|T_|. We relax the constraint of theMME problem so that any task inT − T could be executed on more than one proces-sor simultaneously. Below, we describe the scheduling method for the relaxedMME problem. We assign the tasks inT according to Algorithmltf. Let p_mdenote the load of the processorm after performing the task assignment. There exists a positive valueP_minthat sat-isﬁes the following equation:

M m=1 (P_min− p_m)δ_m= τi∈T −T ci, (3) whereδ_mis 1 ifP_min> p_mand 0 otherwise. Since task migration and simultaneous execution of a task on mul-tiple processors are allowed for the tasks inT − T, we can distribute the computation of these tasks among

(6)

τ1 τ2 τ3 τ4 τ5 τ6 τ7 τ8 τ9 τ10 τ11 τ12 p1 p2 p3 p4 p5 p6 p7 p8 Pmin T − T

Figure 1. The task assignment of SC_{T,LT F} for

M = 8 and |T_{| = 12. The computation}

require-ments of the tasks inT − Tare distributedx over the processors 3, 4, and 7 (the patterned regions).

the processors. IfP_min > p_m, (P_min− p_m) CPU cycles ofT −Tare distributed on the processorm. Each pro-cessorm then performs computation at the speed pm

D if pm > Pmin and Pmin_D otherwise. LetSCT,LT F denote the resulting schedule. Figure 1 illustrates the loads of processors inSC_{T,LT F} for the relaxedMME prob-lem. Due to the optimality provided in Lemma 4, it is clear that SC_{T,LT F} consumes no more energy than SCT,opt, where SCT,opt is an optimal schedule for the MME problem. Lemma 5 Φ(SCT,LT F) Φ(SCT,opt) ≤ Φ(SCT,LT F) Φ(SC T,LT F) ≤ R ∗_{, where} R∗₌_max{Pli∈LP (liD) M·P ( S

M D) } for any positive integer M, pos-itive realsS and D, and any set L of M positive reals that satisfy_l

i∈Lli=S and maxli∈Lli≤32minli∈Lli.

Proof: Since Φ(SC_{T,LT F} )≤ Φ(SC_T,opt), the ﬁrst in-equality is proved. Let o₁, o₂, · · · , o_M denote the load on each processor for the task assignment generated by Algorithm ltf for T and p₁, p₂, · · · , p_M for T . maxp, minp, and mino are the values with max{pi}, min{pi}, and min{oi}, respectively. It is clear that maxp ≥ Pmin, minp ≤ Pmin, and Φ(SCT,LT F ) ≥ MD · P (Pτi∈Tci

MD ). If maxp = Pmin, then minp = Pmin and Algorithm ltf generates an optimal solu-tion. Therefore, we only consider the condition where maxp > Pmin ≥ minp andT − T= φ. We now prove the second inequality. We ﬁrst consider the case where om≤ Pminfor each processorm. Let m∗be the proces-sor with the largest load inSC_{T,LT F}, i.e.,p_m∗ =max_p, and τ_k be the last task added into T_m∗ in SC_{T,LT F}. Once a processor m satisﬁes p_m ≥ P_min, Algorithm ltf will not assign any more task to the processor m. Therefore, we have _τ_i_∈T_m∗_−{τ_k_}ci < Pmin. It is clear that min_p ≥ _τ

i∈Tm∗−{τk}ci; otherwise, Algo-rithmltf will not assign task τ_k to the processorm∗. Therefore,max_p− min_p ≤ c_k. Because of o_m≤ P_min

for each processorm, we have c_k ≤ c_|T_|+1. We know mino ≤ minp since Algorithm ltf adds the tasks in T − T _{to the processor with the minimal load. Due to} the deﬁnition of T, c_|T_|+1 ≤ 1₂min_o.5 Combining all inequality relations mentioned above, we have

maxp− minp≤ ck≤ c|T_|+1≤ 1

2mino≤ 1 2minp. Therefore,max_p≤ 3₂min_pwhich proves the second in-equality.

We now consider the case where someo_m > P_min. Let these processors beJ, where |J| ≥ 1. For each pro-cessorm in J, SC_{T,LT F} andSC_{T,LT F} assign the same tasks on the processorm, e.g., the processors 1, 2, 5, 6, and 8 in Figure 1. By assuming Φ(SCT,LT F)

Φ(SC T,LT F) > R ∗_{, we} conclude that Φ(SCT,LT F)−Pj∈JP (oj/D)D Φ(SC T,LT F)− P j∈JP (oj/D)D > R ∗_{. This} contradicts the case for M = M − |J| and T = T − ∪j∈JTj. Therefore, the approximation ratio for Al-gorithmltf is R∗.

We now derive the value ofR∗when the energy con-sumption function satisﬁes Formula (1).

Theorem 5 Algorithmltf is a 1.13-approximation

al-gorithm for theMME problem when P (s) ∝ s3, i.e., Vt= 0 in Formula(1).

Proof: We prove this theorem by showing that R∗ _{≤ 1.13. By the deﬁnition of R}∗ _{in Lemma 5, there} must exist at least one real number x for a set L, where 2x ≤ l_i ≤ 3x for all l_i ∈ L. It is clear that 2xM ≤ S ≤ 3xM. For each element l_i ∈ L, we know P (li

D)≤ 3x−lx iP (2xD)+li−2xx P (3xD).6Therefore, we have

li∈LP (

li

D)≤ kP (3xD)+ (M −k)P (2xD) for a realk sat-isfyingk = S−2Mx

x (rephrasing of 3x·k+2x·(M −k) = S). R∗_{is obtained by ﬁnding the proper}_{x which} max-imizes the function f(x) = kP (3x

D) + (M − k)P (2xD). Without loss of generality, letP (s) = αs3, whereα is a constant. By solvingf(x) = 0 and showing f(x) < 0 for all x satisfying S

2M ≥ x ≥ 3MS , f(x) is maximized

when x = _45M19S and the maximum value of f(x) is α193S3/M2

3·452_D3 ≈ 1.13α S 3

M2_D3. Therefore,R∗≤ 1.13.

Corollary 1 Algorithmltf is a 1.13-approximation

al-gorithm for theMME problem when P (s) = C_ef(_2ks3₂ +

2Vts2 k +sVt2+s 2 k Vts k + s 2 4k2 +Vts Vts 4k + s 2 k2).

5 There are two cases: 1. Ifmino =c_2M−|T_|, thenc_|T_|+1 <

1

2c2M−|T_| = 1₂mino; 2. Ifmino = ci+cjfor somei, j > 2M − |T|, then relations c_|T_|+1 ≤ c_iandc_|T_|+1 ≤ c_j re-sult inc_|T_|+1≤1₂mino.

6 The inequality comes from thatP (αx + (1 − α)y) ≤ αP (x) + (1− α)P (y) for any α ∈ (0, 1) and any x, y. The coeﬃcients of P (2x

D) andP (3xD) are obtained by solvinga, b in the following

(7)

3.3. Multiprocessor

Scheduling

When

U

max

= ∞

By adopting the constraint-violation approach [11], we propose an artificial-bound approach by first setting an artificial upper bound on the processor speed and then derive feasible schedules in the minimization of energy consumption. We show that Algorithmsbasic and ltf bound the maximum processor speed by the factors of₂ and (4₃−_3M1 ), respectively. For the sim-plicity of representation, we assume that tasks in T are sorted in a non-increasing order of their computa-tion requirements. We first prove that Algorithm ltf could derive a schedule with a 1.13-approximation ra-tio without violating the maximum processor speed for certain input instances (Please see Theorem 6):

Theorem 6 Algorithmltf is a 1.13-approximation

al-gorithm if the given input instance satisﬁes

P

τi∈Tci

M +

cM+1≤ UmaxD and c1≤ UmaxD.

Proof: This theorem is proved by contradiction. We

assume that there exists a processor m in SC_{T,LT F}, where _τ

i∈Tmci > UmaxD. Let τk be the last task added into T_m in Algorithm ltf. Two cases are con-sidered. If k ≤ M, we know c₁ ≥ c_k > U_maxD. This contradicts the assumption. If k > M, we have

τi∈Tm−{τk}ci > UmaxD − ck ≥ UmaxD − cM+1. In Algorithmltf, once p_m≥

P

τi∈Tci

M , no more tasks can be assigned on the processorm. Therefore,

P

τi∈Tci

M >

τi∈Tm−{τk}ci. Based on the above inequalities, we have

P

τi∈Tci

M +cM+1 > UmaxD. This contradicts our assumption.

We now show that Algorithmsbasic and ltf bound the maximum processor speed by the factors of₂and (4₃−_3M1 ), respectively.

Theorem 7 Given an input instance with a feasible

schedule for the MME problem, no schedule derived from Algorithmltf uses any processor speed larger than (4₃−_3M1 )U_max.

Proof: Let O be a feasible schedule for the input instance. Without loss of generality, O partitions T into M disjoint subsets of tasks. We assume that m is the processor with the largest load in O. Since O is a feasible schedule, the load on the processor m, says p_m, must be no more than U_maxD. Let n be the processor with the largest load in SC_{T,LT F}. By rephrasing the processing time into computation re-quirement in the Makespan problem, we know that pn ≤ (4₃−_3M1 )pm≤ (4₃−_3M1 )UmaxD since the Longest-Processing-Time-First algorithm was proved to be a

(4₃ − _3M1 )-approximation algorithm for the Makespan problem in [6].7 We complete the proof.

Theorem 8 Given an input instance with a feasible

schedule for theMME problem over two processors, no schedule derived from Algorithmbasic uses any proces-sor speed larger than (1 +₂)U_max.

Proof: It comes from the setting of δ in Algorithm

basic.

4. Task Migration: An Optimal

Algo-rithm

Algorithm 3 :ltf-m

Input: (T, D, M);

Output: An optimal schedule with minimum energy

con-sumption;

1: sortT in a non-increasing order of the computation re-quirement of each task; letC ←P_τ

i∈Tci;

2: if _MDC > Umaxor∃τi∈ T such thatci

D > Umaxthen

3: return non-existence of any feasible schedule;

4: leti ← 1;

5: while i ≤ |T | do

6: if ci>_MC then

7: scheduleτito be executed at the speedci

Don the pro-cessorM from time 0 to D;

8: C ← C − ci,i ← i + 1, and M ← M − 1; 9: else 10: break; 11: letS ←_MDC andt ← 0; 12: while i ≤ |T | do 13: if t +ci S > D then

14: scheduleτ_ito be executed at the speedS on the pro-cessorM − 1 from time 0 to t +ci

S − D and on the the processorM from time t to D; M ← M − 1;

15: else

16: scheduleτito be executed on the processorM at the speedS from time t to t +ci

S;

17: i ← i + 1 and t ← (t +ci

S) modD;

18: return the schedule of all tasks;

In this section, an eﬃcient optimal algorithm is pro-posed for the MMEM problem, where task migration is allowed. If|T | ≤ M, based on Formula (2) shown in Section 3, it is clear that the executing of each taskτ_i on the processori from time 0 to D at the speed ci

D re-sults in an optimal schedule. Our proposed Algorithm ltf-m (Algorithm 3) adopts the Largest-Task-First strategy again, and the time complexityO(|T | log |T |) comes from the sorting ofT in line 1. We can prove the following two lemmas.

7 The Makespan problem is as follows: Given processing time for n tasks, ﬁnd an assignment of the tasks to M identical proces-sors so that the completion time for these tasks is minimized.

(8)

Lemma 6 Ifc₁ > SD and |T | ≥ M, then there exists

an optimal schedule which executes onlyτ₁on a processor at the speedc1

D from 0 toD, where S =

P

τi∈Tci

MD .

Lemma 7 Ifc₁ ≤ SD and |T | ≥ M, then there exists

an optimal schedule which executes each task inT on at most two processors at the speedS, where S =

P

τi∈Tci

MD .

Theorem 9 Any schedule derived from Algorithm

ltf-m is an optimal schedule.

Proof: If c₁ > SD where S =

P

τi∈Tci

MD , then Algo-rithmltf-m executes τ₁onM at the speed c1

D, and the remaining tasksT − {τ₁} and M − 1 processors form a subproblem of theMMEM problem; otherwise, Algo-rithmltf-m executes each task in T over at most two processors at the speedS. Based on Lemmas 6 and 7, we conclude this proof by repeating the above proce-dure in solving the MMEM subproblems.

5. Conclusion

This paper targets energy-efficient scheduling of tasks over multiple processors, where tasks share a common deadline. Distinct from the past work, this paper proposes approximation algorithms with differ-ent approximation bounds for processors with/without constraints on the maximum processor speed. We show the non-existence of polynomial-time approximation algorithms in the minimization of energy consumption for multiprocessor scheduling over processors with an upper bound on the processor speed, unlessP = NP . When there is no constraint on processor speeds, we propose an approximation algorithm for two-processor scheduling to provide trade-offs among the specified error, the running time, the approximation ratio, and the memory space complexity. An approximation algo-rithm with a 1.13-approximation ratio for M-processor systems is also derived (M > 2). When there is an up-per bound on processor speeds, an artificial-bound ap-proach is taken to minimize the energy consumption with a 1.13-approximation ratio. Furthermore, an opti-mal polynomial-time scheduling algorithm is proposed for the minimization of the energy consumption when task migration is allowed.

For future research, we shall explore multiprocessor energy-eﬃcient scheduling for task sets with arbitrary deadlines and arrival times.

References

[1] A. Chandrakasan, S. Sheng, and R. Broderson. Lower-Power CMOS digital design. IEEE Journal of of Solid-State Circuit, 27(4):473–484, 1992.

[2] J. S. Chase, D. C. Anderson, P. N. Thakar, A. Vahdat, and R. P. Doyle. Managing energy and server resources in hosting centres. In Symposium on Operating Systems Principles, pages 103–116. ACM Press, 2001.

[3] J.-J. Chen, T.-W. Kuo, and C.-L. Yang. Proﬁt-driven uniprocessor scheduling with timing and energy con-straints. In ACM Symposium on Applied Computing, pages 834–840. ACM Press, 2004.

[4] M. R. Garey and D. S. Johnson. Computers and in-tractability: A guide to the theory of NP-completeness. W.H. Freeman and Co, 1979.

[5] G. Golub and J. Ortega. Scientiﬁc Computing and Dif-ferential Equations. Academic Press, 1992.

[6] R. Graham. Bounds on multiprocessing timing anoma-lies. SIAM Journal on Applied Mathematics, 17:263– 269, 1969.

[7] F. Gruian. System-level design methods for low-energy architectures containing variable voltage processors. In Power-Aware Computing Systems, pages 1–12, 2000. [8] F. Gruian and K. Kuchcinski. Lenes: Task scheduling

for low energy systems using variable supply voltage pro-cessors. In Proceedings of Asia South Paciﬁc Design Au-tomation Conference, pages 449–455, 2001.

[9] O. H. Ibarra and C. E. Kim. Fast approximation algo-rithms for the knapsack and sum of subsets problems. Journal of the ACM, 22(4):463–468, 1975.

[10] T. Ishihara and H. Yasuura. Voltage scheduling prob-lems for dynamically variable voltage processors. In Pro-ceedings of the International Symposium on Low Power Electroncs and Design, pages 197–202, 1998.

[11] J.-H. Lin and J. S. Vitter. -approximations with min-imum packing constraint violation. In Symposium on Theory of Computing, pages 771–782. ACM Press, 1992. [12] R. Mishra, N. Rastogi, D. Zhu, D. Mosse, and R. Mel-hem. Energy aware scheduling for distributed real-time systems. In International Parallel and Distributed Pro-cessing Symposium, page 21, 2003.

[13] V. V. Vazirani. Approximation Algorithms. Springer, 2001.

[14] M. Weiser, B. Welch, A. Demers, and S. Shenker. Scheduling for reduced CPU energy. In Proceedings of Symposium on Operating Systems Design and Imple-mentation, pages 13–23, 1994.

[15] F. Yao, A. Demers, and S. Shankar. A scheduling model for reduced CPU energy. In Proceedings of the 36th An-nual Symposium on Foundations of Computer Science, pages 374–382. IEEE, 1995.

[16] Y. Zhang, X. Hu, and D. Z. Chen. Task scheduling and voltage selection for energy minimization. In Annual ACM IEEE Design Automation Conference, pages 183– 188, 2002.

[17] D. Zhu, R. Melhem, and B. Childers. Scheduling with dynamic voltage/speed adjustment using slack reclama-tion in multi-processor real-time systems. In Proceed-ings of IEEE 22th Real-Time System Symposium, pages 84–94, 2001.

Multiprocessor energy-efficient scheduling with task migration considerations

Multiprocessor Energy-Eﬃcient Scheduling

with Task Migration Considerations

Jian-Jia Chen, Heng-Ruey Hsu, Kai-Hsiang Chuang,

Chia-Lin Yang, Ai-Chun Pang, and Tei-Wei Kuo

Department of Computer Science and Information

Engineering National Taiwan University, Taipei, Taiwan 106, ROC.

E-Mails:

{r90079, b89108, b89109, yangc, acpang, ktw}@csie.ntu.edu.tw

Abstract

1. Introduction

2. Problem

Deﬁnitions

and

NP-Hardness

2.1. Problem Deﬁnitions

2.2. Hardness of the

MME Problem

3. Multiprocessor

Scheduling

with-out Task Migration

3.1. Multiprocessor Scheduling over Two

Identical Processors When

U

=

∞

3.2. Multiprocessor Scheduling over an

Ar-bitrary Number of Processors When

U

=

∞

3.3. Multiprocessor

Scheduling

When

U

= ∞

4. Task Migration: An Optimal

Algo-rithm

5. Conclusion

References

= ∞