Branch-and-boundtask allocation with task clustering-basedpruning
Yung-Cheng Ma
∗, Tien-Fu Chen, Chung-Ping Chung
Department of Computer Science and Information Engineering, National Chiao-Tung University, 1001 Ta Hsueh Road, Hsinchu 30050, Taiwan Received8 June 2000; receivedin revisedform 5 July 2004
Abstract
We propose a task allocation algorithm that aims at finding an optimal task assignment for any parallel programs on a given machine configuration. The theme of the approach is to traverse a state–space tree that enumerates all possible task assignments. The efficiency of the task allocation algorithm comes from that we apply a pruning rule on each traversedstate to check whether traversal of a given sub-tree is required by taking advantage of dominance relation and task clustering heuristics. The pruning rules try to eliminate partial assignments that violate the clustering of tasks, but still keeping some optimal assignments in the future search space. In contrast to previous state–space searching methods for task allocation, the proposed pruning rules significantly reduce the time and space required to obtain an optimal assignment andleadthe traversal to a near optimal assignment in a small number of states. Experimental evaluation shows that the pruning rules make the state–space searching approach feasible for practical use.
© 2004 Publishedby Elsevier Inc.
Keywords: Task allocation; Branch-and-bound; Pruning rule; Dominance relation; State–space searching
1. Introduction
Advances in hardware and software technologies have led to the use of parallel anddistributedcomputing systems. To execute a parallel program efficiently, the mapping of program tasks to processors shouldconsider both loadbal-ancing and reducing communication overhead. This paper studies such a task allocation problem.
Several research works have been done for the task al-location problem. Although the task alal-location problem has been shown to be NP-complete[3], a set of heuristics have been proposed[4,8,9,11,14,15,19,23]. A drawback of these heuristics is the poor quality on the assignment found[5]. On the other hand,[1,2,7,12,13,16–18,20] proposedstate– space searching methods with differences in the problem formulation for various applications andmachine configura-tions. The state–space searching approach finds an optimal assignment at the cost of intractable time andspace com-plexity. AhmadandKwok[1] proposedpruning rules and
∗Corresponding author. Fax: +886-3-5724176. E-mail address:ycma@csie.nctu.edu.tw(Y.-C. Ma). 0743-7315/$ - see front matter © 2004 Publishedby Elsevier Inc. doi:10.1016/j.jpdc.2004.08.002
parallelization methodto reduce the time to findan optimal solution of assigning precedence-constrained graphs. In this
paper, we follow the task graph mode of[18], which models
a set of parallel processes without precedence constraint, andpropose pruning rules to improve the efficiency of state– space searching method.
The key idea of the proposed pruning rule is to detect task clustering in the task graph. We observe that tasks can be groupedsuch that a group is a set of heavily communi-catedtasks andinter-group communication weights are rela-tively small. While traversing the state–space, our proposed algorithm detects task clustering from traversal history and tries to prune partial assignments that violate the detected task clustering. We prove that the proposedpruning rule will reserve some optimal assignment in the future search space. This guarantees the optimality of the solution found. Moreover, our experiment shows that the proposedalgo-rithm traverses only a low-order polynomial number of states to reach a near optimal assignment. Hence, when time and space is limited, a near optimal assignment can be obtained. This makes our proposedalgorithm feasible for practical use.
This paper is organizedas follows. Section 2 models the task allocation problem as a state–space searching problem. Section 3 describes the basic idea of the proposed pruning rule. Section 4 describes the dominance relation, which is the basis to derive our pruning rule. Section 5 described the proposedpruning rule Section 6 describes the proposed task allocation algorithm andthe space management policy. Section 7 presents the experiment to show the effectiveness of our proposedpruning rules. Finally, a conclusion is given in Section 8.
2. Modeling task allocation problem
In this section, we present how the task allocation prob-lem is formulatedandtransformedinto state–space search-ing problem. This section defines the terminologies used in this paper andgives the framework of our proposedtask al-location algorithm.
2.1. Formulating task allocation problem
We follow[4,9,18]to formulate the task allocation
prob-lem. This formulation assumes that there are little or no precedence relationships and synchronization requirements so that processor idleness is negligible. Contentions on com-munication links are also ignored.
The optimization problem is formulatedas follows. The input to a task allocation algorithm is a task graph G and a machine configurationM. The output, calleda complete
assignment, is a mapping that maps the set of tasksT to the
set of processors P . An optimal assignment is a complete assignment with minimum cost. The cost of an assignment is the turn-around time of the last processor finishing its execution. To findan optimal assignment, the branch-and-boundalgorithm will go through several partial assignments, where only a subset of the tasks has been assigned. We define the above terminology to formulate the task allocation problem.
A parallel program is representedas a task graph
G(T , E, e, c). The vertex set of the task graph is the set of
tasksT = {t0, t1, . . . , tn−1}. Each task ti ∈ T represents a program module. The edge set E of the task graph repre-sents communication between tasks. Two tasksti andtj are connectedby an edge ifti communicates withtj. For each taskti ∈ T , a weight e(ti) is associatedwith it to represent the execution time of the taskti. For each edge(ti, tj) ∈ E, a weight c(ti, tj) is given to represent the amount of data transferredbetween tasksti andtj.
An example task graph is depicted in Fig. 1. Each vertex is a task andthe number on each task is the execution weight
e(ti) for the task ti. Associatedwith the number on edge
(ti, tj) is the communication weight c(ti, tj). Throughout this article, we will use this task graph to demonstrate the idea behind our algorithm.
600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10
Fig. 1. Example of a task graph.
The machine configuration is representedas M(P, d).
P = {p0, p1, . . . , pm−1} is the set of all processors. For each
pair of processorspk,pl ∈ P , k = l, a distance d(pk, pl) is associatedto represent the latency of transferring one unit of data between pk and pl. If two tasks ti and tj are as-signedto different processors pk andpl, respectively, the
time requiredfor task ti to communicate with tj is
esti-matedto bec(ti, tj)∗d(pk, pl). The communication time be-tween two tasks within the same processor is assumedto be zero.
A machine configuration example is depicted in Fig. 2. We take the hierarchical architecture as an example. The machine consists of two subnets. It takes 5 units of time to transfer a unit of data for two processors in the same subnet and20 units for two processors in different subnets. Throughout this paper, we will use the hierarchical archi-tecture to demonstrate the idea of our task allocation algo-rithm. However, our proposedalgorithm can also be applied to other machine configurations with non-uniform distances between processors.
A complete assignmentAcis a mapping that maps the set
of tasksT to the set of processors P . To finda complete as-signment, our task allocation algorithm will examine several
partial assignments. A partial assignment A is a mapping
that mapsQ, a proper subset of T , to the set of processors
P .
The turn-around time of processor pk, denoted TAk(A),
under a partial/complete assignmentA is defined to be the
time to execute all tasks assignedto pk plus the time that
these tasks communicate with other tasks not assignedto
pk. That is, TAk(A) = ti:A(ti)=pk e(ti) + ti:A(ti)=pk tj:A(tj)=pk ×c(ti, tj)∗d(pk, A(tj)). (1) The cost of a partial/complete assignment is the turn-around time of the last processor finishing its execution:
cost(A) = max
p0 p1 p2 p3 p0 p1 p2 p3 0 5 20 20 5 0 20 20 20 20 0 5 20 20 5 0 d(pk,pl): p k p l p0 p1 p2 p3 interconnection interconnection interconnection cluster cluster (a) (b)
Fig. 2. Example of a machine configuration: (a) the clusteredarchitecture and(b) the distance matrix(d(pk, pl)).
t0-->p0 t0-->p1 t0-->p2
t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2
t2-->p0 t2-->p1 t2-->p2
root
t3-->p0 t3-->p1 t3->p2
internal nodes: partial assignments leaves: complete assignment (Goal Nodes)
t0-->p0 t0-->p1 t0-->p2
t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2 t1-->p0 t1-->p1 t1-->p2
t2-->p0 t2-->p1 t2-->p2
root
t3-->p0 t3-->p1 t3->p2
internal nodes: partial assignments leaves: complete assignment (Goal Nodes)
Fig. 3. State–space tree.
An optimal assignmentAoptis a complete assignment with
minimum cost:
cost(Aopt)
= min{cost(Ac)|Ac is a complete assignment}. (3)
2.2. Transforming to the state–space searching problem—A∗-algorithm
We solve the task allocation problem by state–space
searching with pruning rules. Shen andTsai[18]proposed
a state–space search algorithm without pruning to solve the task allocation problem. This state–space search methodis known as the A∗-algorithm[6], which has been proven to guarantee the optimality of the solution obtained. Based on the A∗-algorithm, we add a pruning rule to reduce the search space to be traversed. In our experiment, this A∗ -algorithm will be usedas a baseline for comparison with our branch-and-bound algorithm.
As illustratedin Fig. 3, the state–space tree represents all possible task assignments. We use an (n + 1)-level m-ary tree to enumerate all possibilities of assigningn tasks to m processors. In the literature of branch-and-bound method, a node in the state–space tree is called a branching state. In this study, a branching state represents either a partial or a complete assignment, depending on whether the branching state is an internal node or a leaf node in the state–space tree.
In the remaining of this article, we will use the terms branch-ing states andpartial/complete assignments interchangeably. The traversal proceeds as follows. During the traversal,
an active set [10] (also calledthe open set in some
litera-ture[6]), denoted ActiveSet, is usedto keep track of all par-tial/complete assignments that have been exploredbut not visited. In each iteration during the traversal, the following operations are performed:
Step 1: Remove a partial/complete assignment Av from
ActiveSet andvisitAv.
Step 2: If Av is a complete assignment, terminate the traversal andreturnAvas the output.
Step 3: Check if the sub-trees derived fromAvneedfurther traversal by using the pruning rule.
Step 4: If the sub-tree of Avneeds further traversal, put each childnode ofAvin the state–space tree into ActiveSet.
For simplicity, we useActiveSet(k)to denote the contents of the ActiveSet at the beginning of the kth iteration, and
A(k)v to denote the partial/complete assignment visited in the
kth iteration.
We follow the approach in Shen andTsai[18] to deter-mine the traverse order. For each partial/complete assign-mentA, a lower-bound(denotedL(A)) on all complete as-signments extended from A (or A itself in case that A is a complete assignment) is estimated. In each iteration during the traversal, the partial/complete assignmentAvwith min-imumL(•) is removedfrom ActiveSet andvisited. L(A) is
computedaccording to the additional cost of assigning tasks
not assignedinA.
Given a partial assignmentA in which Q ⊆ T has been
assigned, we defineACk(tj → pl, A) to reflect the
addi-tional cost on processorpkif tasktjis assignedto processor
pl: ACk(tj → pk, A) = e(tj) + ti:A(ti)=pk c(ti, tj)∗d(pk, A(ti)) if pk= pl, (4) ACk(tj → pl, A) = ti:A(ti)=pk c(ti, tj)∗d(pk, pl) if pk= pl. (5)
For a partial assignmentA, the cost lower-bound L(A)
for all complete assignments extended fromA is estimated
to be L(A) ≡ max processorpk TAk(A) + ti:not assignedin A × min prcoessorpl ACk(ti → pl, A) . (6)
Without pruning rules, the methodpresentedso far is
known as A∗-algorithm[6], which was originally proposed
by Shen andTsai[18]for task allocation. The A∗-algorithm traverses all partial assignments withL(•) less than the op-timal cost. We propose a pruning rule to reduce the state– space size to be traversed.
3. Basic idea of the proposed pruning rule
The development of the pruning rule is based on the clus-tering of tasks. As shown in Fig. 4, tasks are groupedsuch
600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 tasks suitable to be placed in the same processor
tasks suitable to be placed in the same subnet
600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 tasks suitable to be placed in the same processor
tasks suitable to be placed in the same subnet
Fig. 4. Sample clustering of tasks according to communication weights.
that each group contains heavily communicating tasks. The key observation is that a group may contain a set of tasks suitable to be placedin the same processor, or a set of tasks suitable to be placedin the same subnet in the hierarchi-cal architecture. While traversing the state–space tree, our branch-and-bound algorithm detects the clustering of tasks andtries to prune those partial assignments that violate the clustering heuristic. The effectiveness of the pruning rule thus depends on whether the tasks can be clearly clustered into groups.
The development of the pruning rule consists of two phases. In Section 4, we first develop a dominance relation. This dominance relation is effective only when a small cut is met. In Section 5, we further integrate the detection of clustering of tasks with the dominance relation to form an enhancedpruning rule.
4. Pruning search space by dominance relation
We first develop a dominance relation to serve as the basis for developing the pruning rule. We pick two
par-tial assignmentsA1 andA2 in which the same set of tasks
has been assigned. Suppose cost(A1)cost(A2). We call
A1 the winner andA2the loser. Let A1-best andA2-best be
the complete assignments with a minimum cost in the sub-tree belowA1andA2, respectively. We want to be able to
check whether it is possible that the winner–loser relation-ship will be changed, that is, cost(A1-best)cost(A2-best). Our proposeddominance relation claims that what may re-verse the winner–loser relationship is the weights of edges between assignedandun-assignedtasks in the task graph. The dominance relation is effective in pruning the search space when the weights between assignedandun-assigned tasks are small.
4.1. Formalization of dominance relation
Definition 1 (Dominance relation). Let A1 andA2 be two
A1 A1 Q S state-space tree: A2 A2
A’1(ti)=A’2(ti) for ti S
Aa Aa tasks in p k tasks not in pk Qk Sk A1 A1 Q S state-space tree: A2 A2
A’1(ti)=A’2(ti) for ti S A1 A1 Q S state-space tree: A2 A′ ′ ′ 2
A’1(ti)=A’2(ti) for tiy S
Aa Aa tasks in p k tasks not in pk Qk Sk Aa Aa tasks in p k tasks not in pk Qk Sk Qk Sk Qk Sk Qk Sk (a) (b)
Fig. 5. Idea behind deriving the dominance relation: (a) selection of partial/complete assignments and (b) classifications on tasks.
guarantee that cost(A1-best)cost(A2-best), where A1-best
andA2-best are complete assignments with minimum cost
extended fromA1andA2, respectively.
The inference rule we use to derive a dominance relation is as follows. We omittedthe proof since it is a direct con-sequence from Definition 1.
Corollary 1 (Inference rule for deriving the dominance
relation). Let A1 and A2 be two partial assignments.A1
dominates A2 if for any complete assignmentA2 extended from A2, there exists a complete assignmentA1 extended fromA1, such that TAk(A2) − TAk(A1)0 for each proces-sorpk.
The idea to derive a dominance relation is depicted in Fig. 5. The assignmentsA1,A2,A1, andA2concernedin
Corollary 1 are shown in Fig. 5(a), where S = T − Q.
A
1andA2are chosen such thatA1andA2have the same
future extension. We rewrite the turn-aroundtime equation according to the task classification shown in Fig. 5(b). In addition to TAk(A2) − TAk(A1), the communication time
between assignedandto-be-assignedtasks inA1(A2) also
contribute to TAk(A2) − TAk(A1). This gives a lower bound estimation on TAk(A2) − TAk(A1). The
proposeddomi-nance relation checks whetherA2can be prunedor not
ac-cording the estimated turn-around time difference
lower-bound.
We introduce the following notations:
• Execution(R) =
ti∈R
e(ti), where R is a set of tasks.
• Communication(R1, R2) = ti∈R1 tj∈R2 c(ti, tj)∗d(Aa (ti), Aa(tj)), where R1andR2are sets of tasks.
Following the classification on tasks shown in Fig. 5(b), we rewrite the turn-aroundtime equation in the following lemma. The proof is omittedsince it is a trivial computation from the turn-aroundtime formula.
Lemma 1 (Reformulating the turn-aroundtime). LetAabe
a partial assignment andAa be a complete assignment ex-tended fromAa.Q is the set of tasks assigned in Aa andS
is the set of tasks not assigned inAa. Then TAk(Aa) = TAk(Aa) + Execution(Sk(Aa)) + Communication(Qk(Aa), Sk(Aa)) + Communication(Qk(Aa), Sk(Aa)) + Communication(Sk(Aa), Sk(Aa)), (7) where • Qk(Aa) = {ti ∈ Q|Aa(ti) = pk} and Qk(Aa) = Q − Qk (Aa), • Sk(Aa) = {ti ∈ S|Aa(ti) = pk} and Sk(Aa) = S − Sk (Aa).
Before stating the dominance relation, we state the
turn-around time difference lower-bound TADLk(A1, A2). Let A1
and A2 be two partial assignments with the same set of
tasks Q being assigned, and S = T − Q. TADLk(A1, A2)
is a lower boundon TAk(A2) − TAk(A1), where A1andA2
are arbitrary complete assignments extendfromA1andA2,
respectively, such thatA1(ti) = A2(ti) for each task ti ∈ S.
TADLk(A1, A2) is estimatedto be TADLk(A1, A2) ≡ TAk(A2) − TAk(A1) + ti∈S × min pl∈P(ACk(ti → pl, A2) − ACk(ti → pl, A1)) . (8)
We then check whether A2 can be prunedor not by
computing TADLk(A1, A2) for each processor pk. If
TADLk(A1, A2) is greater than or equal to zero for each
processor pk, it indicates that TAk(A2) − TAk(A1)0 for
each processor pk andhence we can prune A2. This is
statedin the following theorem.
Theorem 1 (Dominance relation for space pruning). Let
A1andA2be two partial assignments containing the same set of tasks. If TADLk(A1, A2)0 for each processor pk, thenA1dominatesA2.
A1: t0 p0 t1 p0 t2 p0 A2: t0 p0 t1 p0 t2 p1 TA0(A1)=1300 TA1(A1)=0 TA2(A1)=0 TA3(A1)=0 TA0(A2)=3750 TA1(A2)=3050 TA2(A2)=0 TA3(A2)=0 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 Q
edges that may affect the winner-loser relationship A1: t0 p0 t1 p0 t2 p0 A2: t0 p0 t1 p0 t2 p1 TA0(A1)=1300 TA1(A1)=0 TA2(A1)=0 TA3(A1)=0 TA0(A2)=3750 TA1(A2)=3050 TA2(A2)=0 TA3(A2)=0 A1: t0→p0 t1→p0 t2→p0 A2: t0→p0 t1→p0 t2→p1 TA0(A1)=1300 TA1(A1)=0 TA2(A1)=0 TA3(A1)=0 TA0(A2)=3750 TA1(A2)=3050 TA2(A2)=0 TA3(A2)=0 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 Q 600 400 300 800 700 750 1000 1200 1000 1000 600 450 800 t0 t1 t2 t4 t3 t5 t6 t7 t12 t11 t10 t8 t9 500 400 150 40 30 200 50 200 300 100 20 300 50 100 200 100 10 10 Q
edges that may affect the winner-loser relationship edges that may affect the winner-loser relationship AC0(ti pl, A1): ti pl t6 t8 t10 p0 1000 1000 450 p1 0 100 50 p2 0 400 200 p3 0 400 200 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 700 150 600 600 AC0(ti pl, A2): ti pl t6 t8 t10 p0 1000 1000 500 p1 0 100 0 p2 0 400 0 p3 0 400 0 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 850 0 0 0 TA0(A2)-TA0(A1) 3750-1300 + (-600) + (-200) 0 due to t4 due to t10 AC0(ti pl, A1): ti pl t6 t8 t10 p0 1000 1000 450 p1 0 100 50 p2 0 400 200 p3 0 400 200 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 700 150 600 600 AC0(ti→pl, A1): ti pl t6 t8 t10 p0 1000 1000 450 p1 0 100 50 p2 0 400 200 p3 0 400 200 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 700 150 600 600 AC0(ti pl, A2): ti pl t6 t8 t10 p0 1000 1000 500 p1 0 100 0 p2 0 400 0 p3 0 400 0 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 850 0 0 0 AC0(ti→pl, A2): ti pl t6 t8 t10 p0 1000 1000 500 p1 0 100 0 p2 0 400 0 p3 0 400 0 t5 750 0 0 0 t7 1200 0 0 0 t9 1000 0 0 0 t11 600 0 0 0 t12 800 0 0 0 t3 800 200 800 800 t4 850 0 0 0 TA0(A2)-TA0(A1) 3750-1300 + (-600) + (-200) 0 due to t4 due to t10 TA0(A′2)-TA0(A′1)≥3750-1300 + (-600) + (-200) ≥0 due to t4 due to t10 (a) (b) (c)
Fig. 6. Example to illustrate the dominance relation: (a) partial assignments in consideration, (b) the task graph and (c) effects onp0 for all possible
extensions.
Proof. To draw a dominance relation by Corollary 1, we
pick the complete assignment A1 extended from A1 such
thatA1(ti) = A2(ti) for each ti ∈ S. The pattern is depicted in Fig. 5(a). We want to show that TAk(A2) − TAk(A1)0 for eachpk.
We decompose both TAk(A2) and TAk(A1) as statedin Lemma 1. SinceA1(ti) = A2(ti) for each ti ∈ S, we have
• Execution(Sk(A2)) − Execution(Sk(A1)) = 0, and
• Communication(Sk(A2), Sk(A2)) − Communication (Sk(A1), Sk(A1)) = 0. Hence, we have TAk(A2) − TAk(A1) = TAk(A2) − TAk(A1) + (Communication(Sk(A2), Qk(A2)) − Communication(Sk(A1), Qk(A1))) + (Communication(Sk(A2), Qk(A2)) − Communication(Sk(A1), Qk(A1))) = TAk(A2) − TAk(A1) + ti∈S × (ACk(ti → A2(ti), A2) − ACk(ti → A2(ti), A1)). (9)
Taking a lower boundon the turn-aroundtime difference, we have TAk(A2) − TAk(A1) TAk(A2) − TAk(A1) + ti∈S min pl∈P(ACk(ti → pl, A2) − ACk(ti → pl, A1)).
The right-handside of above inequality is the TADLk(A1,
A2) defined previously. Hence if TADLk(A1, A2)0 for
eachpk, it impliesA1dominatesA2. 4.2. Example of the dominance relation
We use the task graph in Fig. 1 andthe machine configura-tion in Fig. 2 to illustrate the idea of the dominance relaconfigura-tion given in Theorem 1. The partial assignments concernedare
A1andA2shown in Fig. 6(a).A1is the winner andA2is the
loser in this comparison. We apply Theorem 1 to guarantee that the winner–loser relationship will not be reversed.
We use the example in Fig. 6 to explain the key idea of exploiting task clustering. In the task graph in Fig. 6(b),
{t0, t1, t2} is a group of heavily communicating tasks and
shouldbe assignedto the same processor. In Fig. 6(a),A1
is a partial assignment obeying the task clustering andA2
is a partial assignment that violates the task clustering. The dominance relation examines the “cut”, edges between as-signedtasks{t0, t1, t2} andremaining tasks (boldededges
in Fig. 6(b)), to test whetherA2can be prunedor not. The
examination finds that edges from assigned tasks tot4and
t10 are the only possible causes for A2 to win back what
it has lost (cf. Fig. 6(c)). The edge weights in the cut are relative small andhence positive TADLk(A1, A2) values are
obtained. This results inA2been pruned. Enumerating
heav-ily communicatedtasks in consecutive order ensures that a cut with light-weightededges can be met andimproves the pruning efficiency of the dominance relation.
5. Pruning search space by task clustering
The dominance relation proposed in Section 4 is effec-tive only when a small cut can be found. To relieve this constraint, we develop a further pruning rule that considers both the detection of clustering of tasks and the dominance relation.
How well the pruning rule works depends on the task enumeration order. We assume that tasks are enumerated in an order such that heavily communicated tasks will be enu-meratedfirst. We will see how such an enumeration order is obtainedin Section 6. With this assumption, a task assign-ment has the following properties:
• A complete assignment obtainedby a greedy search policy reflects the clustering of tasks.
• The first partial assignment of assigning a sub-graph vis-itedreflects the clustering of tasks in the sub-graph. With these properties, we obtain (1) partial assignmentAk— calledthe killer—reflecting the clustering of tasks, and(2)
complete assignmentAu servedas an upper boundon the
optimal cost to test whether a candidate partial assignment
A can be pruned. These are the inputs to our pruning rule.
We use the task graph in Fig. 1 andthe machine config-uration in Fig. 2 to illustrate how the pruning rule works as depicted in Fig. 7. The killerAkis a partial assignment with
more tasks than the candidateA has. In the Fig. 7 example,
Akreflects the clustering of tasks by showing that{t0, t1, t2}
shouldbe placedin the same processor and{t0, t1, t2, t3, t4}
shouldbe placedin the same subnet. We are thus given the guidelines to extendA: (i) t2shouldbe assignedtop0, (ii) t3,t4shouldbe assignedto either ofp0andp1.
Complete assignments extended fromA can be classified
into two categories: extensions following or violating the
guidelines. For extensions violating the guidelines, we es-timate the cost lower boundandexclude those extensions whose costs are guaranteedto be greater than or equal to
cost(Au). For extensions following the guidelines, we find
a dominatorAdfrom the killerAk that dominates these
ex-tensions. These observations leadus to propose the pruning rule, whose criteria for pruning the search space is statedas follows.
Pruning criteria: Let Ad and A be two partial
assign-ments in which the same set of tasks has been determined,
and Au be a complete assignment. We prune A if for
any complete assignment A extended from A, either (i)
cost(A)cost(A
u) or (ii) there exists a complete
assign-mentAdextended fromAdsuch thatcost(Ad)cost(A).
5.1. Predicting clustering of tasks
Fig. 8 presents the procedure Compute_PA(A, Ak) to pre-dict the clustering of tasks. The result of this detection is a set of possible assignments, denoted PAis, for each taskti
not assignedin A. Each PAi is a set of processors which
we can assign taskti to PAis are determined according to a killerAk. That is, the killer shouldreflect the clustering of tasks in a task graph. How such a killer can be obtainedwill be explainedin Section 5.4.
To generate a guideline to extendingA, we sketch a d
is-tance hierarchy on processors centralizedat the “central
pro-cessor”pc andmap the tasks to the distance hierarchy. Let
ta be the last task assignedinA. We take pc to be the one tais assignedto inAk (cf. Step 1 in Fig. 8). For each taskti
assignedinAk but not inA, we let PAibe the set of all pro-cessors with distance less than or equal tod(pc, Ak(ti)) (cf.
Step 2 in Fig. 8). Ifti is not assignedinAk, no prediction is made and PAi is set to be the set of all processors.
5.2. Examining partial assignment using pruning rule
Fig. 9 presents the procedure PruneTest to test whether a partial assignment can be pruned. Procedure PruneTest calls Compute_PA to predict the guidelines to extending the
candidateA. From there, the remaining work is to examine
whether the sub-tree ofA needs further traversal using the
pruning rule.
We first test the correctness of the prediction outcome
PAis. The test is performedby estimating a turn-around time
lower-bound for extensions violating the guidelines, denoted TALk(A, violate PAi), statedas follows:
TALk(A, violate PAi) ≡ TAk(A) + tj not assignedin A andtj =ti × min processorpl ACk(tj → pl, A) + min
root t0 p0 t1 p1 A dominated by Ad t2 p0 t3 p1 t4 p0 A’ extensions that obey the guidelines
cost(A’) cost(Au) t2 p1 t3 p2 t4 p3 A’ extensions that violet the guidelines t1 p0 t2 p0 t3 p1 t4 p1 Ad Ak to predict the restriction on extending A dominator killer restrictions on extending A: • t2 {p0} • t3, t4 {p0, p1} root t0→p0 t1→p1 A dominated by Ad t2 p0 t3 p1 t4 p0 A’ extensions that obey the guidelines dominated by Ad t2 p0 t3 p1 t4 p0 A’ extensions that obey the guidelines
t2→p0
t3→p1
t4→p0
A’
extensions that obey the guidelines
cost(A’) cost(Au) t2 p1 t3 p2 t4 p3 A’ extensions that violet the guidelines cost(A’)≥cost(Au) t2 p1 t3 p2 t4 p3 A’ extensions that violet the guidelines
t2→p1
t3→p2
t4→p3
A’
extensions that violet the guidelines t1 p0 t2 p0 t3 p1 t4 p1 Ad Ak to predict the restriction on extending A dominator killer t1→p0 t2→p0 t3→p1 t4→p1 Ad Ak to predict the restriction on extending A dominator killer restrictions on extending A: • t2→{p0} • t3, t4→{p0, p1}
Fig. 7. Pruning basedon task clustering.
Algorithm Compute_PA(A, Ak) • input:
– A, Ak: partial assignments, number of tasks assigned in Ak≥number of tasks assigned in A • output:
– PAi⊆P for each task tinot assigned in A (P is the set of all processors) • method:
1) pc←Ak(ta) where tais the last task assigned in A 2) for each task tinot assigned in A do
if tiis assigned in Akthen PAi←{ processor pk| d(pk, pc)≤d(Ak(ti), pc) } else PAi←P
Fig. 8. Algorithm to predict the clustering of tasks.
Algorithm PruneTest(A,Ak,Au) • input: – A, Ak: partial assignments. • d epth(Ak)≥depth(A) – Au: a complete assignment • output:
– prune=True if A can be pruned, otherwise prune=False
• method:
1) perform Compute_PA(A, Ak) to determine PAifor each task tinot assigned in A
2) /* exclude extensions violating PA */ 2.1) success←False
2.2) for each processor pkdo
if TALk(A, violate PA)≥cost(Au) then
success ←True
break
2.3) if success=False then PAi←P
3) Ad←the ancestor of Akin the same level with A
4) prune←True
5) /* dominate extensions obeying PA */
for each processor pkdo
if TADLk(Ad,A,PA)<0 then prune←False
break
6) return prune
Lemma 2. Let A be a partial assignment and A be a complete assignment extended from A. If there exists a task ti not assigned in A such that A(ti) /∈ PAi, then
TAk(A)TALk(A, violate PAi) for each processor pk.
Proof. The proof is similar to the estimation of the cost
lower boundL(•) in[18]. The only difference is that when
taking minimum on the sum of additional cost to obtain a lower boundon TAk(A), the possibilities of assigning ti to processors in PAi are excluded.
After excluding extensions violating the guidelines, we then check the dominance imposed on the remain-ing extensions. The dominator Ad is the ancestor of Ak in the state–space tree at the same level with A. Similar to the procedure in Section 4, we estimate a turn-around
time difference lower-bound between Ad and A, denoted
TADLk(Ad, A, PA), assuming that AdandA have the same future extensions andfollowing the guidelines for each task
ti not assignedinA(Ad). We estimate TADLk(Ad, A, PA) as follows: TADLk(Ad, A, PA) = TAk(A) − TAk(Ad) + tinot assigned min pl∈PAi(ACk(ti → pl, A) − ACk(ti → pl, Ad)) . (11)
Comparedto the TADLk(Ad, A) defined in Section 4,
these two quantities are estimatedin similar ways. The dif-ference is that the future extensions ofAdandA have been
restrictedto be in PAis in estimating TADLk(Ad, A, PA).
And TADLk(Ad, A) = TADLk(Ad, A, PA) if each PAi
con-tains all of the processors.
Theorem 2 (Pruning rule). LetAdandA be two partial as-signments in which the same set of tasks has been deter-mined, andAube a complete assignment. PAi’s are guide-lines to extendA for each task ti not assigned inA. If
(i) For each taskti not assigned inA, there exists a pro-cessorpk such that TALk(A, violate PAi)cost(Au). And
(ii) TADLk(Ad, A, PA)0 for each processorpk.
Then the pruning criteria is satisfied andA can be pruned.
Proof. By Lemma 2, hypothesis (i) implies that complete
assignments extended fromA violating the guidelines PAis
will have a cost greater than or equal to cost(Au). The
remainder of the proof is to estimate a lower bound on
TAk(A) − TAk(Ad). This is similar to Theorem 1, but the
possibilities of extending A to an assignment that
vio-late the guidelines PAis are ignored. The lower bound of
TAk(A)−TAk(Ad) is thus estimatedto be TADLk(Ad, A, PA)
as defined before. This proves the theorem.
The procedure PruneTest uses Theorem 2 to test whether
A can be prunedor not. Hypothesis (i) of Theorem 2 is
guaranteedby Step 2. Step 5 in the procedure PruneTest checks whether hypothesis (ii) of Theorem 2 holds. This test
then returns the result indicating whether A can be pruned
or not.
The advantage of using the pruning rule in Theorem 2 insteadof the dominance relation in Theorem 1 is that the space can be prunedearlier during the traversal. For the ex-ample given in Fig. 7, this advantage is shown in Fig. 10. If we use the dominance relation given in Theorem 1 as the pruning rule, the bolded partial assignments will be tra-versed. The reduced search space is an exponential function of the depth of the clustering of tasks that we can detect.
5.3. Obtaining an upper bound on the optimal cost
To check whether a partial assignmentA can be pruned,
the procedure PruneTest uses two additional inputs: (1) a
complete assignmentAu servedas an upper boundon the
optimal cost and(2) a killerAk reflecting the clustering of tasks. Another use of such anAuis to serve as an “imperfect
solution” once the “perfect solution” cannot be found. The task allocation problem is well known to be NP-complete
[2]. Once the optimal assignment cannot be foundsubject to time andspace constraints, an “imperfect solution”—a complete assignment that may not be optimal—wouldbe returnedas the output. In this section, we describe how such anAucan be obtained.
We use a greedy search approach to obtain a complete as-signmentAu. A pointerp is usedto indicate the status of the greedy search. At the beginning,p points at the starting node (the partial assignment currently visited) in the state–space tree. In each step, we move p down to one of its children with the minimum cost. The procedure terminates when (1)
p points at a partial assignment with a cost greater than that
of the presentAu, or (2)p points at a complete assignment.
Auis then updated if a better complete assignment is found. The reason we use greedy search is because not only of its simplicity but also the fact that a low cost complete assignment can be obtainedif a careful task enumeration order is applied. Assume the tasks are enumerated in an order such that heavily communicatedtasks will be enumerated consecutively. The complete assignment obtainedwill reflect the clustering of tasks andis likely to have a low cost.
To illustrate the idea, we take the task graph in Fig. 1 and machine configuration in Fig. 2 as an example. Consider the greedy search starts from the partial assignment {t0 →
p0, t1→ p0}. Part of the greedy search path is shown in Fig. 11. The greedy search will assignt2top0next since it is the childof{t0→ p0, t1→ p0} with the lowest cost. This se-lection indicates thatt0,t1, andt2may needbe placedin the same processor. Similarly,t3will be assignedtop1following
t2→p1or p2 or p3
excluded due to the cost cost(Au) dominated by Ad root t0→p0 t1→p0 t2→p0 t3→p1 t4→p1 t1→p1 t2→→p0 Ad Ak A to predict the guideline on extending A t3→p0 t3→p1 t3→p0or p1
ti→pkpartial assignments saved
t2→p1or p2 or p3
excluded due to the cost ≥cost(Au) dominated by Ad root t0→p0 t1→p0 t2→p0 t3→p1 t4→p1 t1→p1 t2→p0 Ad Ak A to predict the guideline on extending A t3→p0 t3→p1 t3→p0or p1
ti→pkpartial assignments saved ti→pkpartial assignments saved
Fig. 10. Space savedby the pruning criteria.
t0→p0 t1→p0 t2→p0 t2→p1 t2→p2 t2→p3 t3→p0 t3→p1 t3→p2 t3→p3 t4→p1 t4→p0 t4→p2 t4→p3
selected in the greedy search path
Fig. 11. Greedy search on the state–space tree.
the parent partial assignment{t0→ p0, t1→ p0, t2→ p0},
also reflecting the clustering of tasks. Following the same procedure, we obtain a complete assignment that obeys the task clustering guideline.
5.4. Obtaining killers reflecting clustering of tasks
In addition to the complete assignment Au, a partial
as-signmentAk reflecting the clustering of tasks is also helpful to enhance the pruning rule. To increase the possibility of pruning a partial assignment, we may findmultiple killers to form a KillerSet, insteadof only one killer. The procedure PruneTest is then performedfor each killer in the KillerSet to test whether a partial assignment can be pruned.
Partial assignments reflecting clustering of tasks can be obtainedby the proposedtask enumeration order andthe state–space tree traverse order. A partial assignment covers a sub-graph of the task graph. With the assumption that heav-ily communicatedtasks are enumeratedconsecutively, we can capture part of the clustering of tasks in the sub-graph.
Since we traverse the task graph in the minimumL(•) first
order, the first partial assignment containing the sub-graph
visitedis the one with minimumL(•) among all partial
as-signments containing the sub-graph. The first partial
assign-ment of containing a sub-graph visitedindicates the cluster-ing of tasks, otherwise it will have a largeL(•).
We follow the principle that the first partial assignment indicates clustering of tasks to obtain killers. We assess that a candidate partial assignmentA will be prunedif it violates the clustering of tasks somewhere in the path from root to the branching state in the state–space tree. Partial assignments having taken advantage of clustering of the tasks assigned byA are those partial assignments each of which (1) have
a common ancestor with A in the state–space tree, (2) are
visitedearlier thanA, and(3) are deeper than A in the state–
space tree such that the sub-graph containedin A is also
contained in them. This leads to the design of our heuristic scheme to obtain the killers.
To realize the scheme, a link to the deepest descendant node is associatedwith each visitedpartial assignment. For each partial assignmentAa, we associate a pointerdeep(Aa)
pointing at the deepest partial assignment visited in the sub-tree of Aa. If two or more partial assignments at the same
level of the state–space tree are visited,deep(Aa) points at
the first one visited, which has the smallest cost lower bound
estimate (L(•)) on all its extensions. The KillerSet is the
set of all deep(Aa) for each ancestor of A along with the
complete assignmentAu.
KillerSet(A)
= {deep(Aa)|Aais an ancestor of A} ∪ {Au}.
The determination of the KillerSet is depicted in Fig. 12.
The number in each node is theL(•) of the partial
assign-ment representedby the node. For each visitednodeAa, the
dashed link represents the deepest link deep(Aa). When a
partial assignmentA is visited, we follow the deepest link
along all ancestors of A to obtain the KillerSet. In this ex-ample, the KillerSet to be usedfor pruning A is {A6, A4}
plusAu. That is, for each sub-tree (of the state–space tree)
containingA, we pick the best branching state visitedin the sub-tree to try to pruneA.
Part of the State Space Tree partial assignment that traversed partial assignment that not traversed
the deepest link
Deep(A0)=A4, Deep(A1)=A4, Deep(A2)=A6 KillerSet(A)={A6,A4} 30 32 33 35 45 38 47 50 36 39 A A0 A1 A2 A3 A4 A5 A6
Fig. 12. Deepest link to determine the KillerSet.
6. Branch-and-bound task allocation with preprocessing We now present the task allocation algorithm using the pruning rules. We present how a goodenumeration order is obtainedin Section 6.1. In Section 6.2, the branch-and-boundalgorithm along with the correctness proof will be presented.
6.1. Preprocessing to determine the task enumeration order
We have seen the importance of the task enumeration order in previous sections. For the following reasons, tasks shouldbe enumeratedin such an order that tasks with high communication are enumeratedfirst:
• To arrive at a small cut to exploit the dominance relation before the space overflow.
• To obtain killers that take advantage of the clustering of tasks.
• To obtain a low cost complete assignment serving as an upper boundon the optimal cost.
The task enumeration order is determined by applying the max-flow min-cut algorithm recursively to partition the task graph. Each time the max-flow min-cut procedure is applied, the set of tasks is decomposed into two partitions connectedby a minimum cut. We repeat the partitioning recursively until each partition contains only one task. The partitioning process can be representedby a tree. Each leaf in the tree represents a group containing only one task. The enumeration order is thus the order of all leaf nodes in depth first traversal. For instance, the partitioning process for the task graph in Fig. 1 is depicted in Fig. 13. Following this result, we obtain the enumeration order that has been used for illustration in previous discussion.
6.2. The optimal branch-and-bound algorithm
The branch-and-bound algorithm is shown in Fig. 14. This is basedon the A∗traversal scheme with the addition of the pruning rules andrelatedimplementation code presentedin Section 5. We now show that an optimal assignment can be obtainedby the proposedalgorithm if neither time-out nor overflow of the ActiveSet occurs.
To be convenient, we introduce some terminologies
andnotations. A complete assignment Ac is saidto be
in the future search space of ActiveSet(k) if either Ac ∈
ActiveSet(k) or there exists a partial assignment A
a ∈ ActiveSet(k) such thatA
c can be derived fromAa. On the
other hand, we say Ac is lost from ActiveSet(k) if Ac is
not in the future search space of ActiveSet(k). The depth
of a partial/complete assignment A, denoted depth(A), is
the length of the path from the root to the branching states representingA in the state–space tree.
The difficulty of showing the correctness of the algorithm is that the pruning rules may remove some partial assign-ments that can leadto optimal assignassign-ments. Fortunately, it can be guaranteedthat there exists other optimal assignments in the future search space after pruning. When an optimal assignment is pruned, we always can find another optimal assignment survivedin the future search space, as shown in Fig. 15. Providedthat some optimal assignments survivedin the future search space, we show that the termination con-dition implies the optimality of the solution obtained. Lemma 3. Assume that no overflow in the ActiveSet occurs.
Then, during the traversal, there are always some optimal assignments survived in the future search space.
Proof. We prove this by induction on the number of iterations
i. The induction hypothesis is that
• for any optimal assignment Aopt-0 not in the future
search space, there exists another optimal assignment
Aopt-k survivedin the future search space such that
depth(A
k)depth(A0), where A0 and Ak are the last
visitedancestors ofAopt-0andAopt-k, respectively.
Lemma 3 holds in the beginning since no optimal assign-ment is lost at initialization. Assuming the induction hypoth-esis holds at the beginning of certain iteration. Suppose there is a partial assignmentA0been prunedin this iteration and
A
0can be extended to some optimal assignmentAopt-0. The
proof is to findtheAopt-k andAk described in the induction
hypothesis.
In this case, A0 must have been prunedby some
domi-natorA1, which can also be extended to an optimal
assign-mentAopt-1(otherwise the pruning criteria is violated). Let A
1 be the last visitedancestor of Aopt-1. By the pruning
rule, part of the sub-tree below A1 must be traversedand
hence depth(A1)depth(A1) = depth(A0). If A1 is not
{t0,t1,...,t12} {t0,t1,...,t7} {t0,t1,t2,t3,t4} {t5,t6,t7} {t0,t1,t2} {t3,t4} {t5} {t6,t7} {t0,t1} {t2} {t0} {t1} {t3} {t4} {t6} {t7} {t8,t9,...,t12} {t8,t9} {t10,t11, t12} {t8} {t9} {t10} {t11, t12} {t11} {t12}
Fig. 13. Determining the task enumeration order.
Algorithm BB-Alloc(G,M) • /* initialization phase */
– L(root of the state-space tree) ←0 – ActiveSet←{root of the state-space tree}
– Obtain Auby performing greedy search starting at the root of the state-space tree • while not time-out do /* traversal phase */
1) remove a partial/complete assignment Avwith minimum L(•) from ActiveSet andperform the following to visit(Av)
1.1) /* update deepest link for all ancestor of A */ deep(A)←A
for each Aa: ancestor of A in the state-space tree do if depth(A)>depth(deep(Aa)) then deep(Aa)←A 1.2) /* try to improve Au*/
perform greedy search starting from A to obtain a complete assignment Ac if cost(Ac)<cost(Au) then Au←Ac
2) if Avis a complete assignment then Au←Avand terminate the traversal by return Au 3) /* check if the sub-tree of A needs further traversal */
KillerSet←{deep(Aa)| Aais an ancestor of Avin the state-space tree}∪{Au} prune ←False
for each AkyKillerSet do
prune←PruneTest(Ak, Au,Av) if prune=True then break
4) /* exploit children of A if the sub-tree of A needs further traversal */ if prune=False then
for each child A′vof Avin the state-space tree do compute L(A′v) and insert A′vinto ActiveSet
Fig. 14. The branch-and-bound algorithm for task allocation.
hence the induction hypothesis holds for the next iteration (cf. Fig. 15(a)). In case that Aopt-1 is lost, the induction
hypothesis states that there exists a survivedoptimal as-signmentAopt-k with the last visitedancestorAk such that depth(A
k)depth(A1)depth(A1) = depth(A0) (cf.
Fig. 15(b)). Andhence we obtain the requiredAopt-k and
A
k forAopt-0andA0. This proves the lemma.
Theorem 3 (Correctness of our proposedalgorithm). Our
proposed branch-and-bound algorithm will end up with an optimal assignment if neither space overflow in the ActiveSet nor time-out occurs.
Proof. If not timed-out, some complete assignmentAcwill
be removedfrom the ActiveSet in the last iteration during the traversal. The complete assignment returnedis this Ac.
We want to show thatAcis optimal.
We prove this by contradiction. SupposeAcis not optimal.
Consider the contents ofActiveSet(j)for the last iteration
j. Lemma 3 states the existence of an optimal assignment Aopt in the future search space of ActiveSet(j). Thus, we
havecost(Ac) > cost(Aopt) since Aopt is optimal. LetAa
be the ancestor of Aopt (or Aopt itself) in ActiveSet(j).
By the definition of L(•), L(Aa)cost(Aopt). Andhence
L(Aa)cost(Aopt) < cost(Ac) = L(Ac). However, Ac is
tasks traversed A
Ai+1 Ai+1is the dominator that
prunes Ai Aopt-0 0 A1 1 state-space tree Aopt-1 Aopt-0 0 A1 1 A2 2 Ak k state-space tree Aopt-1 Aopt-2 Aopt-k tasks traversed A
Ai+1 Ai+1is the dominator that
prunes Ai tasks traversed tasks traversed
A
Ai+1 Ai+1is the dominator that
prunes Ai A′
′
i
Ai+1 Ai+1is the dominator that
prunes Ai Aopt-0 0 A1 1 state-space tree Aopt-1 Aopt-0 0 A1 1 A2 2 Ak k state-space tree Aopt-1 Aopt-2 Aopt-k Aopt-0 A′ A′ A′ A′ A′ A′ A1 0 1 A2 2 Ak k state-space tree Aopt-1 Aopt-2 Aopt-k (a) (b)
Fig. 15. Finding an optimal assignment survived in the future search space. • L(A1)<L(A2) but A2can be extended to an optimal assignment
50 t8 t9 t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 100 200 100 10 10 p0 p1 t8 t9 t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 100 200 100 10 10 p0 p1 (a) t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 50 100 200 100 10 10 p0 p1 t2 600 400 300 800 700 750 1000 1200 t0 t1 t4 t3 t5 t6 t7 500 400 150 40 30 200 50 200 300 100 20 1000 1000 600 450 800 t12 t11 t10 300 50 100 200 100 10 10 p0 p1 (b)
Fig. 16. Unfair comparison: assigning different sets of tasks: (a) partial assignmentA1and(b) partial assignmentA2.
L(Ac)L(Aa). This produces a contradiction and hence
proves this theorem.
6.3. Space-efficient ActiveSet organization
The remaining problem in designing the task allocation algorithm is the design of ActiveSet such that (1) the
par-tial/complete assignment with minimumL(•) can be easily
removed, and (2) a near optimal assignment can be obtained once overflow occurs. A simple solution is to implement the
ActiveSet as a heap anddrop the partial/complete
assign-ment with maximumL(•) when overflow occurs, because
such an assignment is unlikely to be extended to an optimal assignment. However, this scheme has certain drawbacks. We identify two situations that will reduce the effectiveness of the victim selection scheme:
• Unfair comparisons between partial assignments contain-ing different sets of tasks.
• Unfair comparisons between partial assignments using different numbers of processors.
Fig. 16 depicts an example of unfair comparison between partial assignments assigning different sets of tasks. Con-sider mapping the task graph in Fig. 1 to the machine con-figuration in Fig. 2. Fig. 16 depicts two partial assignments
A1 and A2 containing different sub-graphs and L(A1) < L(A2). However, A2can be extended to an optimal
assign-ment but A1 cannot. A partial assignment containing less
number of tasks usually has lower cost and L(•), but this
does not mean it has a better future extension. Our solution is to keep partial assignments assigning different number of tasks in different heaps.
Fig. 17 depicts an example of unfair comparison be-tween partial assignments using different number of
pro-cessors. We have two partial assignments A1 andA2 with
L(A1) < L(A2). A1 is the best assignment to assign the
sub-graph containing tasks {t0, t1, t2, t3, t4}. However, A2
can be extended to an optimal assignment but A1 cannot.
The assignment lacks knowledge of future load to be
as-signedandhence A1 uses too many processors for tasks