• 沒有找到結果。

Branch-and-bound task allocation with task clustering-based pruning

N/A
N/A
Protected

Academic year: 2021

Share "Branch-and-bound task allocation with task clustering-based pruning"

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)

Branch-and-boundtask allocation with task clustering-basedpruning

Yung-Cheng Ma

, Tien-Fu Chen, Chung-Ping Chung

Department of Computer Science and Information Engineering, National Chiao-Tung University, 1001 Ta Hsueh Road, Hsinchu 30050, Taiwan Received8 June 2000; receivedin revisedform 5 July 2004

Abstract

We propose a task allocation algorithm that aims at finding an optimal task assignment for any parallel programs on a given machine configuration. The theme of the approach is to traverse a state–space tree that enumerates all possible task assignments. The efficiency of the task allocation algorithm comes from that we apply a pruning rule on each traversedstate to check whether traversal of a given sub-tree is required by taking advantage of dominance relation and task clustering heuristics. The pruning rules try to eliminate partial assignments that violate the clustering of tasks, but still keeping some optimal assignments in the future search space. In contrast to previous state–space searching methods for task allocation, the proposed pruning rules significantly reduce the time and space required to obtain an optimal assignment andleadthe traversal to a near optimal assignment in a small number of states. Experimental evaluation shows that the pruning rules make the state–space searching approach feasible for practical use.

© 2004 Publishedby Elsevier Inc.

Keywords: Task allocation; Branch-and-bound; Pruning rule; Dominance relation; State–space searching

1. Introduction

Advances in hardware and software technologies have led to the use of parallel anddistributedcomputing systems. To execute a parallel program efficiently, the mapping of program tasks to processors shouldconsider both loadbal-ancing and reducing communication overhead. This paper studies such a task allocation problem.

Several research works have been done for the task al-location problem. Although the task alal-location problem has been shown to be NP-complete[3], a set of heuristics have been proposed[4,8,9,11,14,15,19,23]. A drawback of these heuristics is the poor quality on the assignment found[5]. On the other hand,[1,2,7,12,13,16–18,20] proposedstate– space searching methods with differences in the problem formulation for various applications andmachine configura-tions. The state–space searching approach finds an optimal assignment at the cost of intractable time andspace com-plexity. AhmadandKwok[1] proposedpruning rules and

Corresponding author. Fax: +886-3-5724176. E-mail address:ycma@csie.nctu.edu.tw(Y.-C. Ma). 0743-7315/$ - see front matter © 2004 Publishedby Elsevier Inc. doi:10.1016/j.jpdc.2004.08.002

parallelization methodto reduce the time to findan optimal solution of assigning precedence-constrained graphs. In this

paper, we follow the task graph mode of[18], which models

a set of parallel processes without precedence constraint, andpropose pruning rules to improve the efficiency of state– space searching method.

The key idea of the proposed pruning rule is to detect task clustering in the task graph. We observe that tasks can be groupedsuch that a group is a set of heavily communi-catedtasks andinter-group communication weights are rela-tively small. While traversing the state–space, our proposed algorithm detects task clustering from traversal history and tries to prune partial assignments that violate the detected task clustering. We prove that the proposedpruning rule will reserve some optimal assignment in the future search space. This guarantees the optimality of the solution found. Moreover, our experiment shows that the proposedalgo-rithm traverses only a low-order polynomial number of states to reach a near optimal assignment. Hence, when time and space is limited, a near optimal assignment can be obtained. This makes our proposedalgorithm feasible for practical use.

(2)

This paper is organizedas follows. Section 2 models the task allocation problem as a state–space searching problem. Section 3 describes the basic idea of the proposed pruning rule. Section 4 describes the dominance relation, which is the basis to derive our pruning rule. Section 5 described the proposedpruning rule Section 6 describes the proposed task allocation algorithm andthe space management policy. Section 7 presents the experiment to show the effectiveness of our proposedpruning rules. Finally, a conclusion is given in Section 8.

2. Modeling task allocation problem

In this section, we present how the task allocation prob-lem is formulatedandtransformedinto state–space search-ing problem. This section defines the terminologies used in this paper andgives the framework of our proposedtask al-location algorithm.

2.1. Formulating task allocation problem

We follow[4,9,18]to formulate the task allocation

prob-lem. This formulation assumes that there are little or no precedence relationships and synchronization requirements so that processor idleness is negligible. Contentions on com-munication links are also ignored.

The optimization problem is formulatedas follows. The input to a task allocation algorithm is a task graph G and a machine configurationM. The output, calleda complete

assignment, is a mapping that maps the set of tasksT to the

set of processors P . An optimal assignment is a complete assignment with minimum cost. The cost of an assignment is the turn-around time of the last processor finishing its execution. To findan optimal assignment, the branch-and-boundalgorithm will go through several partial assignments, where only a subset of the tasks has been assigned. We define the above terminology to formulate the task allocation problem.

A parallel program is representedas a task graph

G(T , E, e, c). The vertex set of the task graph is the set of

tasksT = {t0, t1, . . . , tn−1}. Each task ti ∈ T represents a program module. The edge set E of the task graph repre-sents communication between tasks. Two tasksti andtj are connectedby an edge ifti communicates withtj. For each taskti ∈ T , a weight e(ti) is associatedwith it to represent the execution time of the taskti. For each edge(ti, tj) ∈ E, a weight c(ti, tj) is given to represent the amount of data transferredbetween tasksti andtj.

An example task graph is depicted in Fig. 1. Each vertex is a task andthe number on each task is the execution weight

e(ti) for the task ti. Associatedwith the number on edge

(ti, tj) is the communication weight c(ti, tj). Throughout this article, we will use this task graph to demonstrate the idea behind our algorithm.

600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10

Fig. 1. Example of a task graph.

The machine configuration is representedas M(P, d).

P = {p0, p1, . . . , pm−1} is the set of all processors. For each

pair of processorspk,pl ∈ P , k = l, a distance d(pk, pl) is associatedto represent the latency of transferring one unit of data between pk and pl. If two tasks ti and tj are as-signedto different processors pk andpl, respectively, the

time requiredfor task ti to communicate with tj is

esti-matedto bec(ti, tj)d(pk, pl). The communication time be-tween two tasks within the same processor is assumedto be zero.

A machine configuration example is depicted in Fig. 2. We take the hierarchical architecture as an example. The machine consists of two subnets. It takes 5 units of time to transfer a unit of data for two processors in the same subnet and20 units for two processors in different subnets. Throughout this paper, we will use the hierarchical archi-tecture to demonstrate the idea of our task allocation algo-rithm. However, our proposedalgorithm can also be applied to other machine configurations with non-uniform distances between processors.

A complete assignmentAcis a mapping that maps the set

of tasksT to the set of processors P . To finda complete as-signment, our task allocation algorithm will examine several

partial assignments. A partial assignment A is a mapping

that mapsQ, a proper subset of T , to the set of processors

P .

The turn-around time of processor pk, denoted TAk(A),

under a partial/complete assignmentA is defined to be the

time to execute all tasks assignedto pk plus the time that

these tasks communicate with other tasks not assignedto

pk. That is, TAk(A) =  ti:A(ti)=pk e(ti) +  ti:A(ti)=pk  tj:A(tj)=pk ×c(ti, tj)d(pk, A(tj)). (1) The cost of a partial/complete assignment is the turn-around time of the last processor finishing its execution:

cost(A) = max

(3)

p0 p1 p2 p3 p0 p1 p2 p3 0 5 20 20 5 0 20 20 20 20 0 5 20 20 5 0 d(pk,pl): p k p l p0 p1 p2 p3 interconnection interconnection interconnection cluster cluster (a) (b)

Fig. 2. Example of a machine configuration: (a) the clusteredarchitecture and(b) the distance matrix(d(pk, pl)).

t0-->p0 t0-->p1 t0-->p2

t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2

t2-->p0 t2-->p1 t2-->p2

root

t3-->p0 t3-->p1 t3->p2

internal nodes: partial assignments leaves: complete assignment (Goal Nodes)

t0-->p0 t0-->p1 t0-->p2

t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2

t2-->p0 t2-->p1 t2-->p2

root

t3-->p0 t3-->p1 t3->p2

internal nodes: partial assignments leaves: complete assignment (Goal Nodes)

Fig. 3. State–space tree.

An optimal assignmentAoptis a complete assignment with

minimum cost:

cost(Aopt)

= min{cost(Ac)|Ac is a complete assignment}. (3)

2.2. Transforming to the state–space searching problem—A-algorithm

We solve the task allocation problem by state–space

searching with pruning rules. Shen andTsai[18]proposed

a state–space search algorithm without pruning to solve the task allocation problem. This state–space search methodis known as the A∗-algorithm[6], which has been proven to guarantee the optimality of the solution obtained. Based on the A∗-algorithm, we add a pruning rule to reduce the search space to be traversed. In our experiment, this A∗ -algorithm will be usedas a baseline for comparison with our branch-and-bound algorithm.

As illustratedin Fig. 3, the state–space tree represents all possible task assignments. We use an (n + 1)-level m-ary tree to enumerate all possibilities of assigningn tasks to m processors. In the literature of branch-and-bound method, a node in the state–space tree is called a branching state. In this study, a branching state represents either a partial or a complete assignment, depending on whether the branching state is an internal node or a leaf node in the state–space tree.

In the remaining of this article, we will use the terms branch-ing states andpartial/complete assignments interchangeably. The traversal proceeds as follows. During the traversal,

an active set [10] (also calledthe open set in some

litera-ture[6]), denoted ActiveSet, is usedto keep track of all par-tial/complete assignments that have been exploredbut not visited. In each iteration during the traversal, the following operations are performed:

Step 1: Remove a partial/complete assignment Av from

ActiveSet andvisitAv.

Step 2: If Av is a complete assignment, terminate the traversal andreturnAvas the output.

Step 3: Check if the sub-trees derived fromAvneedfurther traversal by using the pruning rule.

Step 4: If the sub-tree of Avneeds further traversal, put each childnode ofAvin the state–space tree into ActiveSet.

For simplicity, we useActiveSet(k)to denote the contents of the ActiveSet at the beginning of the kth iteration, and

A(k)v to denote the partial/complete assignment visited in the

kth iteration.

We follow the approach in Shen andTsai[18] to deter-mine the traverse order. For each partial/complete assign-mentA, a lower-bound(denotedL(A)) on all complete as-signments extended from A (or A itself in case that A is a complete assignment) is estimated. In each iteration during the traversal, the partial/complete assignmentAvwith min-imumL(•) is removedfrom ActiveSet andvisited. L(A) is

(4)

computedaccording to the additional cost of assigning tasks

not assignedinA.

Given a partial assignmentA in which Q ⊆ T has been

assigned, we defineACk(tj → pl, A) to reflect the

addi-tional cost on processorpkif tasktjis assignedto processor

pl: ACk(tj → pk, A) = e(tj) +  ti:A(ti)=pk c(ti, tj)d(pk, A(ti)) if pk= pl, (4) ACk(tj → pl, A) =  ti:A(ti)=pk c(ti, tj)d(pk, pl) if pk= pl. (5)

For a partial assignmentA, the cost lower-bound L(A)

for all complete assignments extended fromA is estimated

to be L(A) ≡ max processorpk  TAk(A) +  ti:not assignedin A ×  min prcoessorpl ACk(ti → pl, A)   . (6)

Without pruning rules, the methodpresentedso far is

known as A∗-algorithm[6], which was originally proposed

by Shen andTsai[18]for task allocation. The A∗-algorithm traverses all partial assignments withL(•) less than the op-timal cost. We propose a pruning rule to reduce the state– space size to be traversed.

3. Basic idea of the proposed pruning rule

The development of the pruning rule is based on the clus-tering of tasks. As shown in Fig. 4, tasks are groupedsuch

600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 tasks suitable to be placed in the same processor

tasks suitable to be placed in the same subnet

600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 tasks suitable to be placed in the same processor

tasks suitable to be placed in the same subnet

Fig. 4. Sample clustering of tasks according to communication weights.

that each group contains heavily communicating tasks. The key observation is that a group may contain a set of tasks suitable to be placedin the same processor, or a set of tasks suitable to be placedin the same subnet in the hierarchi-cal architecture. While traversing the state–space tree, our branch-and-bound algorithm detects the clustering of tasks andtries to prune those partial assignments that violate the clustering heuristic. The effectiveness of the pruning rule thus depends on whether the tasks can be clearly clustered into groups.

The development of the pruning rule consists of two phases. In Section 4, we first develop a dominance relation. This dominance relation is effective only when a small cut is met. In Section 5, we further integrate the detection of clustering of tasks with the dominance relation to form an enhancedpruning rule.

4. Pruning search space by dominance relation

We first develop a dominance relation to serve as the basis for developing the pruning rule. We pick two

par-tial assignmentsA1 andA2 in which the same set of tasks

has been assigned. Suppose cost(A1)cost(A2). We call

A1 the winner andA2the loser. Let A1-best andA2-best be

the complete assignments with a minimum cost in the sub-tree belowA1andA2, respectively. We want to be able to

check whether it is possible that the winner–loser relation-ship will be changed, that is, cost(A1-best)cost(A2-best). Our proposeddominance relation claims that what may re-verse the winner–loser relationship is the weights of edges between assignedandun-assignedtasks in the task graph. The dominance relation is effective in pruning the search space when the weights between assignedandun-assigned tasks are small.

4.1. Formalization of dominance relation

Definition 1 (Dominance relation). Let A1 andA2 be two

(5)

A1 A1 Q S state-space tree: A2 A2

A’1(ti)=A’2(ti) for ti S

Aa Aa tasks in p k tasks not in pk Qk Sk A1 A1 Q S state-space tree: A2 A2

A’1(ti)=A’2(ti) for ti S A1 A1 Q S state-space tree: A2 A′ ′ ′ 2

A’1(ti)=A’2(ti) for tiy S

Aa Aa tasks in p k tasks not in pk Qk Sk Aa Aa tasks in p k tasks not in pk Qk Sk Qk Sk Qk Sk Qk Sk (a) (b)

Fig. 5. Idea behind deriving the dominance relation: (a) selection of partial/complete assignments and (b) classifications on tasks.

guarantee that cost(A1-best)cost(A2-best), where A1-best

andA2-best are complete assignments with minimum cost

extended fromA1andA2, respectively.

The inference rule we use to derive a dominance relation is as follows. We omittedthe proof since it is a direct con-sequence from Definition 1.

Corollary 1 (Inference rule for deriving the dominance

relation). Let A1 and A2 be two partial assignments.A1

dominates A2 if for any complete assignmentA2 extended from A2, there exists a complete assignmentA1 extended fromA1, such that TAk(A2) − TAk(A1)0 for each proces-sorpk.

The idea to derive a dominance relation is depicted in Fig. 5. The assignmentsA1,A2,A1, andA2concernedin

Corollary 1 are shown in Fig. 5(a), where S = T − Q.

A

1andA2are chosen such thatA1andA2have the same

future extension. We rewrite the turn-aroundtime equation according to the task classification shown in Fig. 5(b). In addition to TAk(A2) − TAk(A1), the communication time

between assignedandto-be-assignedtasks inA1(A2) also

contribute to TAk(A2) − TAk(A1). This gives a lower bound estimation on TAk(A2) − TAk(A1). The

proposeddomi-nance relation checks whetherA2can be prunedor not

ac-cording the estimated turn-around time difference

lower-bound.

We introduce the following notations:

• Execution(R) = 

ti∈R

e(ti), where R is a set of tasks.

• Communication(R1, R2) =  ti∈R1  tj∈R2 c(ti, tj)d(Aa (ti), Aa(tj)), where R1andR2are sets of tasks.

Following the classification on tasks shown in Fig. 5(b), we rewrite the turn-aroundtime equation in the following lemma. The proof is omittedsince it is a trivial computation from the turn-aroundtime formula.

Lemma 1 (Reformulating the turn-aroundtime). LetAabe

a partial assignment andAa be a complete assignment ex-tended fromAa.Q is the set of tasks assigned in Aa andS

is the set of tasks not assigned inAa. Then TAk(Aa) = TAk(Aa) + Execution(Sk(Aa)) + Communication(Qk(Aa), Sk(Aa)) + Communication(Qk(Aa), Sk(Aa)) + Communication(Sk(Aa), Sk(Aa)), (7) where • Qk(Aa) = {ti ∈ Q|Aa(ti) = pk} and Qk(Aa) = Q − Qk (Aa), • Sk(Aa) = {ti ∈ S|Aa(ti) = pk} and Sk(Aa) = S − Sk (Aa).

Before stating the dominance relation, we state the

turn-around time difference lower-bound TADLk(A1, A2). Let A1

and A2 be two partial assignments with the same set of

tasks Q being assigned, and S = T − Q. TADLk(A1, A2)

is a lower boundon TAk(A2) − TAk(A1), where A1andA2

are arbitrary complete assignments extendfromA1andA2,

respectively, such thatA1(ti) = A2(ti) for each task ti ∈ S.

TADLk(A1, A2) is estimatedto be TADLk(A1, A2) ≡ TAk(A2) − TAk(A1) +  ti∈S ×  min pl∈P(ACk(ti → pl, A2) − ACk(ti → pl, A1))  . (8)

We then check whether A2 can be prunedor not by

computing TADLk(A1, A2) for each processor pk. If

TADLk(A1, A2) is greater than or equal to zero for each

processor pk, it indicates that TAk(A2) − TAk(A1)0 for

each processor pk andhence we can prune A2. This is

statedin the following theorem.

Theorem 1 (Dominance relation for space pruning). Let

A1andA2be two partial assignments containing the same set of tasks. If TADLk(A1, A2)0 for each processor pk, thenA1dominatesA2.

(6)

A1: t0 p0 t1 p0 t2 p0 A2: t0 p0 t1 p0 t2 p1 TA0(A1)=1300 TA1(A1)=0 TA2(A1)=0 TA3(A1)=0 TA0(A2)=3750 TA1(A2)=3050 TA2(A2)=0 TA3(A2)=0 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 Q

edges that may affect the winner-loser relationship A1: t0 p0 t1 p0 t2 p0 A2: t0 p0 t1 p0 t2 p1 TA0(A1)=1300 TA1(A1)=0 TA2(A1)=0 TA3(A1)=0 TA0(A2)=3750 TA1(A2)=3050 TA2(A2)=0 TA3(A2)=0 A1: t0→p0 t1→p0 t2→p0 A2: t0→p0 t1→p0 t2→p1 TA0(A1)=1300 TA1(A1)=0 TA2(A1)=0 TA3(A1)=0 TA0(A2)=3750 TA1(A2)=3050 TA2(A2)=0 TA3(A2)=0 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 Q 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 Q

edges that may affect the winner-loser relationship edges that may affect the winner-loser relationship AC0(ti pl, A1): ti pl t6 t8 t10 p0 1000 1000 450 p1 0 100 50 p2 0 400 200 p3 0 400 200 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 700 150 600 600 AC0(ti pl, A2): ti pl t6 t8 t10 p0 1000 1000 500 p1 0 100 0 p2 0 400 0 p3 0 400 0 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 850 0 0 0 TA0(A2)-TA0(A1) 3750-1300 + (-600) + (-200) 0 due to t4 due to t10 AC0(ti pl, A1): ti pl t6 t8 t10 p0 1000 1000 450 p1 0 100 50 p2 0 400 200 p3 0 400 200 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 700 150 600 600 AC0(ti→pl, A1): ti pl t6 t8 t10 p0 1000 1000 450 p1 0 100 50 p2 0 400 200 p3 0 400 200 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 700 150 600 600 AC0(ti pl, A2): ti pl t6 t8 t10 p0 1000 1000 500 p1 0 100 0 p2 0 400 0 p3 0 400 0 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 850 0 0 0 AC0(ti→pl, A2): ti pl t6 t8 t10 p0 1000 1000 500 p1 0 100 0 p2 0 400 0 p3 0 400 0 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 850 0 0 0 TA0(A2)-TA0(A1) 3750-1300 + (-600) + (-200) 0 due to t4 due to t10 TA0(A′2)-TA0(A′1)≥3750-1300 + (-600) + (-200) ≥0 due to t4 due to t10 (a) (b) (c)

Fig. 6. Example to illustrate the dominance relation: (a) partial assignments in consideration, (b) the task graph and (c) effects onp0 for all possible

extensions.

Proof. To draw a dominance relation by Corollary 1, we

pick the complete assignment A1 extended from A1 such

thatA1(ti) = A2(ti) for each ti ∈ S. The pattern is depicted in Fig. 5(a). We want to show that TAk(A2) − TAk(A1)0 for eachpk.

We decompose both TAk(A2) and TAk(A1) as statedin Lemma 1. SinceA1(ti) = A2(ti) for each ti ∈ S, we have

• Execution(Sk(A2)) − Execution(Sk(A1)) = 0, and

• Communication(Sk(A2), Sk(A2)) − Communication (Sk(A1), Sk(A1)) = 0. Hence, we have TAk(A2) − TAk(A1) = TAk(A2) − TAk(A1) + (Communication(Sk(A2), Qk(A2)) − Communication(Sk(A1), Qk(A1))) + (Communication(Sk(A2), Qk(A2)) − Communication(Sk(A1), Qk(A1))) = TAk(A2) − TAk(A1) +  ti∈S × (ACk(ti → A2(ti), A2) − ACk(ti → A2(ti), A1)). (9)

Taking a lower boundon the turn-aroundtime difference, we have TAk(A2) − TAk(A1) TAk(A2) − TAk(A1) + ti∈S min pl∈P(ACk(ti → pl, A2) − ACk(ti → pl, A1)).

The right-handside of above inequality is the TADLk(A1,

A2) defined previously. Hence if TADLk(A1, A2)0 for

eachpk, it impliesA1dominatesA2.  4.2. Example of the dominance relation

We use the task graph in Fig. 1 andthe machine configura-tion in Fig. 2 to illustrate the idea of the dominance relaconfigura-tion given in Theorem 1. The partial assignments concernedare

(7)

A1andA2shown in Fig. 6(a).A1is the winner andA2is the

loser in this comparison. We apply Theorem 1 to guarantee that the winner–loser relationship will not be reversed.

We use the example in Fig. 6 to explain the key idea of exploiting task clustering. In the task graph in Fig. 6(b),

{t0, t1, t2} is a group of heavily communicating tasks and

shouldbe assignedto the same processor. In Fig. 6(a),A1

is a partial assignment obeying the task clustering andA2

is a partial assignment that violates the task clustering. The dominance relation examines the “cut”, edges between as-signedtasks{t0, t1, t2} andremaining tasks (boldededges

in Fig. 6(b)), to test whetherA2can be prunedor not. The

examination finds that edges from assigned tasks tot4and

t10 are the only possible causes for A2 to win back what

it has lost (cf. Fig. 6(c)). The edge weights in the cut are relative small andhence positive TADLk(A1, A2) values are

obtained. This results inA2been pruned. Enumerating

heav-ily communicatedtasks in consecutive order ensures that a cut with light-weightededges can be met andimproves the pruning efficiency of the dominance relation.

5. Pruning search space by task clustering

The dominance relation proposed in Section 4 is effec-tive only when a small cut can be found. To relieve this constraint, we develop a further pruning rule that considers both the detection of clustering of tasks and the dominance relation.

How well the pruning rule works depends on the task enumeration order. We assume that tasks are enumerated in an order such that heavily communicated tasks will be enu-meratedfirst. We will see how such an enumeration order is obtainedin Section 6. With this assumption, a task assign-ment has the following properties:

• A complete assignment obtainedby a greedy search policy reflects the clustering of tasks.

• The first partial assignment of assigning a sub-graph vis-itedreflects the clustering of tasks in the sub-graph. With these properties, we obtain (1) partial assignmentAkcalledthe killer—reflecting the clustering of tasks, and(2)

complete assignmentAu servedas an upper boundon the

optimal cost to test whether a candidate partial assignment

A can be pruned. These are the inputs to our pruning rule.

We use the task graph in Fig. 1 andthe machine config-uration in Fig. 2 to illustrate how the pruning rule works as depicted in Fig. 7. The killerAkis a partial assignment with

more tasks than the candidateA has. In the Fig. 7 example,

Akreflects the clustering of tasks by showing that{t0, t1, t2}

shouldbe placedin the same processor and{t0, t1, t2, t3, t4}

shouldbe placedin the same subnet. We are thus given the guidelines to extendA: (i) t2shouldbe assignedtop0, (ii) t3,t4shouldbe assignedto either ofp0andp1.

Complete assignments extended fromA can be classified

into two categories: extensions following or violating the

guidelines. For extensions violating the guidelines, we es-timate the cost lower boundandexclude those extensions whose costs are guaranteedto be greater than or equal to

cost(Au). For extensions following the guidelines, we find

a dominatorAdfrom the killerAk that dominates these

ex-tensions. These observations leadus to propose the pruning rule, whose criteria for pruning the search space is statedas follows.

Pruning criteria: Let Ad and A be two partial

assign-ments in which the same set of tasks has been determined,

and Au be a complete assignment. We prune A if for

any complete assignment A extended from A, either (i)

cost(A)cost(A

u) or (ii) there exists a complete

assign-mentAdextended fromAdsuch thatcost(Ad)cost(A).

5.1. Predicting clustering of tasks

Fig. 8 presents the procedure Compute_PA(A, Ak) to pre-dict the clustering of tasks. The result of this detection is a set of possible assignments, denoted PAis, for each taskti

not assignedin A. Each PAi is a set of processors which

we can assign taskti to PAis are determined according to a killerAk. That is, the killer shouldreflect the clustering of tasks in a task graph. How such a killer can be obtainedwill be explainedin Section 5.4.

To generate a guideline to extendingA, we sketch a d

is-tance hierarchy on processors centralizedat the “central

pro-cessor”pc andmap the tasks to the distance hierarchy. Let

ta be the last task assignedinA. We take pc to be the one tais assignedto inAk (cf. Step 1 in Fig. 8). For each taskti

assignedinAk but not inA, we let PAibe the set of all pro-cessors with distance less than or equal tod(pc, Ak(ti)) (cf.

Step 2 in Fig. 8). Ifti is not assignedinAk, no prediction is made and PAi is set to be the set of all processors.

5.2. Examining partial assignment using pruning rule

Fig. 9 presents the procedure PruneTest to test whether a partial assignment can be pruned. Procedure PruneTest calls Compute_PA to predict the guidelines to extending the

candidateA. From there, the remaining work is to examine

whether the sub-tree ofA needs further traversal using the

pruning rule.

We first test the correctness of the prediction outcome

PAis. The test is performedby estimating a turn-around time

lower-bound for extensions violating the guidelines, denoted TALk(A, violate PAi), statedas follows:

TALk(A, violate PAi) ≡ TAk(A) +  tj not assignedin A andtj =ti ×  min processorpl ACk(tj → pl, A)  + min

(8)

root t0 p0 t1 p1 A dominated by Ad t2 p0 t3 p1 t4 p0 A’ extensions that obey the guidelines

cost(A’) cost(Au) t2 p1 t3 p2 t4 p3 A’ extensions that violet the guidelines t1 p0 t2 p0 t3 p1 t4 p1 Ad Ak to predict the restriction on extending A dominator killer restrictions on extending A: • t2 {p0} • t3, t4 {p0, p1} root t0→p0 t1→p1 A dominated by Ad t2 p0 t3 p1 t4 p0 A’ extensions that obey the guidelines dominated by Ad t2 p0 t3 p1 t4 p0 A’ extensions that obey the guidelines

t2→p0

t3→p1

t4→p0

A’

extensions that obey the guidelines

cost(A’) cost(Au) t2 p1 t3 p2 t4 p3 A’ extensions that violet the guidelines cost(A’)cost(Au) t2 p1 t3 p2 t4 p3 A’ extensions that violet the guidelines

t2→p1

t3→p2

t4→p3

A’

extensions that violet the guidelines t1 p0 t2 p0 t3 p1 t4 p1 Ad Ak to predict the restriction on extending A dominator killer t1→p0 t2→p0 t3→p1 t4→p1 Ad Ak to predict the restriction on extending A dominator killer restrictions on extending A: • t2→{p0} • t3, t4→{p0, p1}

Fig. 7. Pruning basedon task clustering.

Algorithm Compute_PA(A, Ak) • input:

A, Ak: partial assignments, number of tasks assigned in Aknumber of tasks assigned in Aoutput:

– PAiP for each task tinot assigned in A (P is the set of all processors)method:

1) pcAk(ta) where tais the last task assigned in A 2) for each task tinot assigned in A do

if tiis assigned in Akthen PAi{ processor pk| d(pk, pc)≤d(Ak(ti), pc) } else PAiP

Fig. 8. Algorithm to predict the clustering of tasks.

Algorithm PruneTest(A,Ak,Au) • input:A, Ak: partial assignments. • d epth(Ak)≥depth(A)Au: a complete assignment • output:

prune=True if A can be pruned, otherwise prune=False

method:

1) perform Compute_PA(A, Ak) to determine PAifor each task tinot assigned in A

2) /* exclude extensions violating PA */ 2.1) success←False

2.2) for each processor pkdo

if TALk(A, violate PA)cost(Au) then

success ←True

break

2.3) if success=False then PAiP

3) Adthe ancestor of Akin the same level with A

4) prune←True

5) /* dominate extensions obeying PA */

for each processor pkdo

if TADLk(Ad,A,PA)<0 then prune←False

break

6) return prune

(9)

Lemma 2. Let A be a partial assignment and A be a complete assignment extended from A. If there exists a task ti not assigned in A such that A(ti) /∈ PAi, then

TAk(A)TALk(A, violate PAi) for each processor pk.

Proof. The proof is similar to the estimation of the cost

lower boundL(•) in[18]. The only difference is that when

taking minimum on the sum of additional cost to obtain a lower boundon TAk(A), the possibilities of assigning ti to processors in PAi are excluded. 

After excluding extensions violating the guidelines, we then check the dominance imposed on the remain-ing extensions. The dominator Ad is the ancestor of Ak in the state–space tree at the same level with A. Similar to the procedure in Section 4, we estimate a turn-around

time difference lower-bound between Ad and A, denoted

TADLk(Ad, A, PA), assuming that AdandA have the same future extensions andfollowing the guidelines for each task

ti not assignedinA(Ad). We estimate TADLk(Ad, A, PA) as follows: TADLk(Ad, A, PA) = TAk(A) − TAk(Ad) +  tinot assigned  min pl∈PAi(ACk(ti → pl, A) − ACk(ti → pl, Ad))  . (11)

Comparedto the TADLk(Ad, A) defined in Section 4,

these two quantities are estimatedin similar ways. The dif-ference is that the future extensions ofAdandA have been

restrictedto be in PAis in estimating TADLk(Ad, A, PA).

And TADLk(Ad, A) = TADLk(Ad, A, PA) if each PAi

con-tains all of the processors.

Theorem 2 (Pruning rule). LetAdandA be two partial as-signments in which the same set of tasks has been deter-mined, andAube a complete assignment. PAi’s are guide-lines to extendA for each task ti not assigned inA. If

(i) For each taskti not assigned inA, there exists a pro-cessorpk such that TALk(A, violate PAi)cost(Au). And

(ii) TADLk(Ad, A, PA)0 for each processorpk.

Then the pruning criteria is satisfied andA can be pruned.

Proof. By Lemma 2, hypothesis (i) implies that complete

assignments extended fromA violating the guidelines PAis

will have a cost greater than or equal to cost(Au). The

remainder of the proof is to estimate a lower bound on

TAk(A) − TAk(Ad). This is similar to Theorem 1, but the

possibilities of extending A to an assignment that

vio-late the guidelines PAis are ignored. The lower bound of

TAk(A)−TAk(Ad) is thus estimatedto be TADLk(Ad, A, PA)

as defined before. This proves the theorem. 

The procedure PruneTest uses Theorem 2 to test whether

A can be prunedor not. Hypothesis (i) of Theorem 2 is

guaranteedby Step 2. Step 5 in the procedure PruneTest checks whether hypothesis (ii) of Theorem 2 holds. This test

then returns the result indicating whether A can be pruned

or not.

The advantage of using the pruning rule in Theorem 2 insteadof the dominance relation in Theorem 1 is that the space can be prunedearlier during the traversal. For the ex-ample given in Fig. 7, this advantage is shown in Fig. 10. If we use the dominance relation given in Theorem 1 as the pruning rule, the bolded partial assignments will be tra-versed. The reduced search space is an exponential function of the depth of the clustering of tasks that we can detect.

5.3. Obtaining an upper bound on the optimal cost

To check whether a partial assignmentA can be pruned,

the procedure PruneTest uses two additional inputs: (1) a

complete assignmentAu servedas an upper boundon the

optimal cost and(2) a killerAk reflecting the clustering of tasks. Another use of such anAuis to serve as an “imperfect

solution” once the “perfect solution” cannot be found. The task allocation problem is well known to be NP-complete

[2]. Once the optimal assignment cannot be foundsubject to time andspace constraints, an “imperfect solution”—a complete assignment that may not be optimal—wouldbe returnedas the output. In this section, we describe how such anAucan be obtained.

We use a greedy search approach to obtain a complete as-signmentAu. A pointerp is usedto indicate the status of the greedy search. At the beginning,p points at the starting node (the partial assignment currently visited) in the state–space tree. In each step, we move p down to one of its children with the minimum cost. The procedure terminates when (1)

p points at a partial assignment with a cost greater than that

of the presentAu, or (2)p points at a complete assignment.

Auis then updated if a better complete assignment is found. The reason we use greedy search is because not only of its simplicity but also the fact that a low cost complete assignment can be obtainedif a careful task enumeration order is applied. Assume the tasks are enumerated in an order such that heavily communicatedtasks will be enumerated consecutively. The complete assignment obtainedwill reflect the clustering of tasks andis likely to have a low cost.

To illustrate the idea, we take the task graph in Fig. 1 and machine configuration in Fig. 2 as an example. Consider the greedy search starts from the partial assignment {t0 →

p0, t1→ p0}. Part of the greedy search path is shown in Fig. 11. The greedy search will assignt2top0next since it is the childof{t0→ p0, t1→ p0} with the lowest cost. This se-lection indicates thatt0,t1, andt2may needbe placedin the same processor. Similarly,t3will be assignedtop1following

(10)

t2p1or p2 or p3

excluded due to the cost cost(Au) dominated by Ad root t0→p0 t1p0 t2p0 t3p1 t4p1 t1p1 t2→→p0 Ad Ak A to predict the guideline on extending A t3p0 t3p1 t3p0or p1

tipkpartial assignments saved

t2p1or p2 or p3

excluded due to the cost ≥cost(Au) dominated by Ad root t0→p0 t1p0 t2p0 t3p1 t4p1 t1p1 t2p0 Ad Ak A to predict the guideline on extending A t3p0 t3p1 t3p0or p1

tipkpartial assignments saved tipkpartial assignments saved

Fig. 10. Space savedby the pruning criteria.

t0→p0 t1→p0 t2→p0 t2→p1 t2→p2 t2→p3 t3→p0 t3→p1 t3→p2 t3→p3 t4→p1 t4→p0 t4→p2 t4→p3

selected in the greedy search path

Fig. 11. Greedy search on the state–space tree.

the parent partial assignment{t0→ p0, t1→ p0, t2→ p0},

also reflecting the clustering of tasks. Following the same procedure, we obtain a complete assignment that obeys the task clustering guideline.

5.4. Obtaining killers reflecting clustering of tasks

In addition to the complete assignment Au, a partial

as-signmentAk reflecting the clustering of tasks is also helpful to enhance the pruning rule. To increase the possibility of pruning a partial assignment, we may findmultiple killers to form a KillerSet, insteadof only one killer. The procedure PruneTest is then performedfor each killer in the KillerSet to test whether a partial assignment can be pruned.

Partial assignments reflecting clustering of tasks can be obtainedby the proposedtask enumeration order andthe state–space tree traverse order. A partial assignment covers a sub-graph of the task graph. With the assumption that heav-ily communicatedtasks are enumeratedconsecutively, we can capture part of the clustering of tasks in the sub-graph.

Since we traverse the task graph in the minimumL(•) first

order, the first partial assignment containing the sub-graph

visitedis the one with minimumL(•) among all partial

as-signments containing the sub-graph. The first partial

assign-ment of containing a sub-graph visitedindicates the cluster-ing of tasks, otherwise it will have a largeL(•).

We follow the principle that the first partial assignment indicates clustering of tasks to obtain killers. We assess that a candidate partial assignmentA will be prunedif it violates the clustering of tasks somewhere in the path from root to the branching state in the state–space tree. Partial assignments having taken advantage of clustering of the tasks assigned byA are those partial assignments each of which (1) have

a common ancestor with A in the state–space tree, (2) are

visitedearlier thanA, and(3) are deeper than A in the state–

space tree such that the sub-graph containedin A is also

contained in them. This leads to the design of our heuristic scheme to obtain the killers.

To realize the scheme, a link to the deepest descendant node is associatedwith each visitedpartial assignment. For each partial assignmentAa, we associate a pointerdeep(Aa)

pointing at the deepest partial assignment visited in the sub-tree of Aa. If two or more partial assignments at the same

level of the state–space tree are visited,deep(Aa) points at

the first one visited, which has the smallest cost lower bound

estimate (L(•)) on all its extensions. The KillerSet is the

set of all deep(Aa) for each ancestor of A along with the

complete assignmentAu.

KillerSet(A)

= {deep(Aa)|Aais an ancestor of A} ∪ {Au}.

The determination of the KillerSet is depicted in Fig. 12.

The number in each node is theL(•) of the partial

assign-ment representedby the node. For each visitednodeAa, the

dashed link represents the deepest link deep(Aa). When a

partial assignmentA is visited, we follow the deepest link

along all ancestors of A to obtain the KillerSet. In this ex-ample, the KillerSet to be usedfor pruning A is {A6, A4}

plusAu. That is, for each sub-tree (of the state–space tree)

containingA, we pick the best branching state visitedin the sub-tree to try to pruneA.

(11)

Part of the State Space Tree partial assignment that traversed partial assignment that not traversed

the deepest link

Deep(A0)=A4, Deep(A1)=A4, Deep(A2)=A6 KillerSet(A)={A6,A4} 30 32 33 35 45 38 47 50 36 39 A A0 A1 A2 A3 A4 A5 A6

Fig. 12. Deepest link to determine the KillerSet.

6. Branch-and-bound task allocation with preprocessing We now present the task allocation algorithm using the pruning rules. We present how a goodenumeration order is obtainedin Section 6.1. In Section 6.2, the branch-and-boundalgorithm along with the correctness proof will be presented.

6.1. Preprocessing to determine the task enumeration order

We have seen the importance of the task enumeration order in previous sections. For the following reasons, tasks shouldbe enumeratedin such an order that tasks with high communication are enumeratedfirst:

• To arrive at a small cut to exploit the dominance relation before the space overflow.

• To obtain killers that take advantage of the clustering of tasks.

• To obtain a low cost complete assignment serving as an upper boundon the optimal cost.

The task enumeration order is determined by applying the max-flow min-cut algorithm recursively to partition the task graph. Each time the max-flow min-cut procedure is applied, the set of tasks is decomposed into two partitions connectedby a minimum cut. We repeat the partitioning recursively until each partition contains only one task. The partitioning process can be representedby a tree. Each leaf in the tree represents a group containing only one task. The enumeration order is thus the order of all leaf nodes in depth first traversal. For instance, the partitioning process for the task graph in Fig. 1 is depicted in Fig. 13. Following this result, we obtain the enumeration order that has been used for illustration in previous discussion.

6.2. The optimal branch-and-bound algorithm

The branch-and-bound algorithm is shown in Fig. 14. This is basedon the A∗traversal scheme with the addition of the pruning rules andrelatedimplementation code presentedin Section 5. We now show that an optimal assignment can be obtainedby the proposedalgorithm if neither time-out nor overflow of the ActiveSet occurs.

To be convenient, we introduce some terminologies

andnotations. A complete assignment Ac is saidto be

in the future search space of ActiveSet(k) if either Ac ∈

ActiveSet(k) or there exists a partial assignment A

a ∈ ActiveSet(k) such thatA

c can be derived fromAa. On the

other hand, we say Ac is lost from ActiveSet(k) if Ac is

not in the future search space of ActiveSet(k). The depth

of a partial/complete assignment A, denoted depth(A), is

the length of the path from the root to the branching states representingA in the state–space tree.

The difficulty of showing the correctness of the algorithm is that the pruning rules may remove some partial assign-ments that can leadto optimal assignassign-ments. Fortunately, it can be guaranteedthat there exists other optimal assignments in the future search space after pruning. When an optimal assignment is pruned, we always can find another optimal assignment survivedin the future search space, as shown in Fig. 15. Providedthat some optimal assignments survivedin the future search space, we show that the termination con-dition implies the optimality of the solution obtained. Lemma 3. Assume that no overflow in the ActiveSet occurs.

Then, during the traversal, there are always some optimal assignments survived in the future search space.

Proof. We prove this by induction on the number of iterations

i. The induction hypothesis is that

• for any optimal assignment Aopt-0 not in the future

search space, there exists another optimal assignment

Aopt-k survivedin the future search space such that

depth(A

k)depth(A0), where A0 and Ak are the last

visitedancestors ofAopt-0andAopt-k, respectively.

Lemma 3 holds in the beginning since no optimal assign-ment is lost at initialization. Assuming the induction hypoth-esis holds at the beginning of certain iteration. Suppose there is a partial assignmentA0been prunedin this iteration and

A

0can be extended to some optimal assignmentAopt-0. The

proof is to findtheAopt-k andAk described in the induction

hypothesis.

In this case, A0 must have been prunedby some

domi-natorA1, which can also be extended to an optimal

assign-mentAopt-1(otherwise the pruning criteria is violated). Let A

1 be the last visitedancestor of Aopt-1. By the pruning

rule, part of the sub-tree below A1 must be traversedand

hence depth(A1)depth(A1) = depth(A0). If A1 is not

(12)

{t0,t1,...,t12} {t0,t1,...,t7} {t0,t1,t2,t3,t4} {t5,t6,t7} {t0,t1,t2} {t3,t4} {t5} {t6,t7} {t0,t1} {t2} {t0} {t1} {t3} {t4} {t6} {t7} {t8,t9,...,t12} {t8,t9} {t10,t11, t12} {t8} {t9} {t10} {t11, t12} {t11} {t12}

Fig. 13. Determining the task enumeration order.

Algorithm BB-Alloc(G,M) • /* initialization phase */

L(root of the state-space tree) ←0 – ActiveSet←{root of the state-space tree}

Obtain Auby performing greedy search starting at the root of the state-space tree • while not time-out do /* traversal phase */

1) remove a partial/complete assignment Avwith minimum L() from ActiveSet andperform the following to visit(Av)

1.1) /* update deepest link for all ancestor of A */ deep(A)A

for each Aa: ancestor of A in the state-space tree do if depth(A)>depth(deep(Aa)) then deep(Aa)←A 1.2) /* try to improve Au*/

perform greedy search starting from A to obtain a complete assignment Ac if cost(Ac)<cost(Au) then Au←Ac

2) if Avis a complete assignment then Au←Avand terminate the traversal by return Au 3) /* check if the sub-tree of A needs further traversal */

KillerSet{deep(Aa)| Aais an ancestor of Avin the state-space tree}∪{Au} prune ←False

for each AkyKillerSet do

prune←PruneTest(Ak, Au,Av) if prune=True then break

4) /* exploit children of A if the sub-tree of A needs further traversal */ if prune=False then

for each child A′vof Avin the state-space tree do compute L(Av) and insert Avinto ActiveSet

Fig. 14. The branch-and-bound algorithm for task allocation.

hence the induction hypothesis holds for the next iteration (cf. Fig. 15(a)). In case that Aopt-1 is lost, the induction

hypothesis states that there exists a survivedoptimal as-signmentAopt-k with the last visitedancestorAk such that depth(A

k)depth(A1)depth(A1) = depth(A0) (cf.

Fig. 15(b)). Andhence we obtain the requiredAopt-k and

A

k forAopt-0andA0. This proves the lemma. 

Theorem 3 (Correctness of our proposedalgorithm). Our

proposed branch-and-bound algorithm will end up with an optimal assignment if neither space overflow in the ActiveSet nor time-out occurs.

Proof. If not timed-out, some complete assignmentAcwill

be removedfrom the ActiveSet in the last iteration during the traversal. The complete assignment returnedis this Ac.

We want to show thatAcis optimal.

We prove this by contradiction. SupposeAcis not optimal.

Consider the contents ofActiveSet(j)for the last iteration

j. Lemma 3 states the existence of an optimal assignment Aopt in the future search space of ActiveSet(j). Thus, we

havecost(Ac) > cost(Aopt) since Aopt is optimal. LetAa

be the ancestor of Aopt (or Aopt itself) in ActiveSet(j).

By the definition of L(•), L(Aa)cost(Aopt). Andhence

L(Aa)cost(Aopt) < cost(Ac) = L(Ac). However, Ac is

(13)

tasks traversed A

Ai+1 Ai+1is the dominator that

prunes Ai Aopt-0 0 A1 1 state-space tree Aopt-1 Aopt-0 0 A1 1 A2 2 Ak k state-space tree Aopt-1 Aopt-2 Aopt-k tasks traversed A

Ai+1 Ai+1is the dominator that

prunes Ai tasks traversed tasks traversed

A

Ai+1 Ai+1is the dominator that

prunes Ai A

i

Ai+1 Ai+1is the dominator that

prunes Ai Aopt-0 0 A1 1 state-space tree Aopt-1 Aopt-0 0 A1 1 A2 2 Ak k state-space tree Aopt-1 Aopt-2 Aopt-k Aopt-0 AAAAAAA1 0 1 A2 2 Ak k state-space tree Aopt-1 Aopt-2 Aopt-k (a) (b)

Fig. 15. Finding an optimal assignment survived in the future search space. • L(A1)<L(A2) but A2can be extended to an optimal assignment

50 t8 t9 t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 100 200 100 10 10 p0 p1 t8 t9 t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 100 200 100 10 10 p0 p1 (a) t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 50 100 200 100 10 10 p0 p1 t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 50 100 200 100 10 10 p0 p1 (b)

Fig. 16. Unfair comparison: assigning different sets of tasks: (a) partial assignmentA1and(b) partial assignmentA2.

L(Ac)L(Aa). This produces a contradiction and hence

proves this theorem. 

6.3. Space-efficient ActiveSet organization

The remaining problem in designing the task allocation algorithm is the design of ActiveSet such that (1) the

par-tial/complete assignment with minimumL(•) can be easily

removed, and (2) a near optimal assignment can be obtained once overflow occurs. A simple solution is to implement the

ActiveSet as a heap anddrop the partial/complete

assign-ment with maximumL(•) when overflow occurs, because

such an assignment is unlikely to be extended to an optimal assignment. However, this scheme has certain drawbacks. We identify two situations that will reduce the effectiveness of the victim selection scheme:

• Unfair comparisons between partial assignments contain-ing different sets of tasks.

• Unfair comparisons between partial assignments using different numbers of processors.

Fig. 16 depicts an example of unfair comparison between partial assignments assigning different sets of tasks. Con-sider mapping the task graph in Fig. 1 to the machine con-figuration in Fig. 2. Fig. 16 depicts two partial assignments

A1 and A2 containing different sub-graphs and L(A1) < L(A2). However, A2can be extended to an optimal

assign-ment but A1 cannot. A partial assignment containing less

number of tasks usually has lower cost and L(•), but this

does not mean it has a better future extension. Our solution is to keep partial assignments assigning different number of tasks in different heaps.

Fig. 17 depicts an example of unfair comparison be-tween partial assignments using different number of

pro-cessors. We have two partial assignments A1 andA2 with

L(A1) < L(A2). A1 is the best assignment to assign the

sub-graph containing tasks {t0, t1, t2, t3, t4}. However, A2

can be extended to an optimal assignment but A1 cannot.

The assignment lacks knowledge of future load to be

as-signedandhence A1 uses too many processors for tasks

數據

Fig. 1. Example of a task graph.
Fig. 2. Example of a machine configuration: (a) the clusteredarchitecture and(b) the distance matrix (d(p k , p l )).
Fig. 5. Idea behind deriving the dominance relation: (a) selection of partial/complete assignments and (b) classifications on tasks.
Fig. 6. Example to illustrate the dominance relation: (a) partial assignments in consideration, (b) the task graph and (c) effects on p 0 for all possible
+7

參考文獻

相關文件

 Reading and discussion task: Read the descriptors for Level 4 under ‘Content’ in the marking criteria and identify areas for guiding the students to set their goals for the

Task: Writing an article to the school newspaper arguing either for or against the proposal which requires students to undertake 50 hours of community service, in addition to

• To consider the purpose of the task-based approach and the inductive approach in the learning and teaching of grammar at the secondary level.. • To take part in demonstrations

computational &amp; mathematical thinking by task-based teaching as a means to provide an interactive environment for learners to achieve the learning outcomes; and (ii) how

– discrete time and discrete state space – continuous time and discrete state space – discrete time and continuous state space – continuous time and continuous state space..

In Section 4, we give an overview on how to express task-based specifications in conceptual graphs, and how to model the university timetabling by using TBCG.. We also discuss

In this chapter, we have presented two task rescheduling techniques, which are based on QoS guided Min-Min algorithm, aim to reduce the makespan of grid applications in batch

Furthermore, based on the temperature calculation in the proposed 3D block-level thermal model and the final region, an iterative approach is proposed to reduce