Related Works - A Study of Exploiting Efficiency of the Duplication-based Algorithm in Distribu

In this chapter, four well-known duplication-based scheduling algorithms are described, including the Task Duplication-based Scheduling (TDS) algorithm, the Scalable Task Duplication-based Scheduling (STDS) algorithm, the Task Duplication-based Scheduling Algorithm for Network of Heterogeneous Systems (TANH) algorithm, the Heterogeneous N-Predecessor Duplication (HNDP) algorithm and the Heterogeneous Critical Parents with Fast Duplicator (HCPFD) algorithm. These algorithms are discussed in the following sections respectively.

2.1 The TDS Algorithm and The STDS Algorithm

The TDS algorithm (Ranaweera, & Agrawal, 2000) is a duplication-based scheduling algorithm. It includes three steps: the first step is to calculate est,ect, fpred, fproc1 to fprocn and level for each task from the DAG’s entry node to the exit node in a top-down fashion. The level value is the highest value of the sum of the computation cost from the node to the exit node along different paths. The queue is built by all the tasks in ascending order by the levels. The second step is to calculate lact and last for each task in a bottom-up fashion. The third step is to cluster tasks to generate an initial set of clusters using the less number of processors. The algorithm performs the following steps repeatedly. The first step is to find the first un-examined task in the

task’s fproc order to cluster tasks. If the fpred is un-examined or it is on the critical path, the algorithm clusters the fpred of the task. If there is more than one predecessor, the algorithm selects the predecessor with the lowest computing cost on the processor. The final step is to exploit the task duplication to avoid communications. If the predecessor of the task is not the favorite one, the previous tasks are replaced by the favorite tasks by the order of the lengths of the paths in clusters. The replaced tasks are scheduled on a new processor. The TDS algorithm uses a task duplication approach if the favorite predecessor is not in the same cluster. But, replacing all tasks which are scheduled prior to its favorite task in the new cluster might increase the inter-processor communication overhead.

The STDS algorithm (Ranaweera, & Agrawal, 2000) is similar to the TDS one; it performs task duplication is the same as the TDS algorithm. However, the STDS algorithm improves the requirement that the number of processors is larger than that of the available processors. It merges clusters to match the actual number of processors.

Although the STDS algorithm works well in the unbounded number of processors environment. But, in general, the number of processors is bounded; the merging approach could be improved to obtain a better performance.

2.2 The TANH Algorithm

The TANH algorithm (Bajaj, & Agrawal, 2004) is a cluster-based and duplication-based combined algorithm. It is extended from the STDS algorithm (Ranaweera, & Agrawal, 2000) and is modified to get the optimal solution for the DAGs.

The TANH algorithm starts at calculating each node’s est, ect, favorite processors and level values. It obtains est and ect by using the mean computation costs of all PEs.

The TANH algorithm determines favorite processors from fp1 to fpn by sorting the completion time of the nodes over all processors by ascending order under the assumption that there are n processor elements. In other words, the fp1 of the node indicates the node which can obtain the minimal completion time on the processor element. The ect and est values of the node are calculated on the favorite processor. The level value of the node is the longest path along different paths from the node to exit node.

The level value would be obtained by finding the node’s computation cost, and it excludes communication costs between inter-nodes.

The TANH algorithm finds the exit node’s ect value, the exit node’s lact sets to be ect. Calculating the lact and last values for each node is by the bottom-up fashion. The lact is the latest allowable completion time of the node and the last is the latest allowable start time of the node. The fpred is the favorite predecessor which is defined as the predecessor with the largest value of the sum of the earliest completion times and the communication costs among all the predecessors. The TANH algorithm sets the entry

node’s fpred value to be null. The above procedure completes after building the table of those variables.

In the second step, the TANH algorithm generates initial clusters by pushing tasks into the queue by the level values of all nodes in an ascending order. By clustering tasks, each clustering procedure chooses the first unexamined node in the queue. After clustering phase is finished, it assigns the top task in the cluster to the first available favorite processor. Then, if each task’s predecessor is not the critical node or it has been assigned, the algorithm assigns the predecessor to the current cluster, and that predecessor is set to be examined recursively until reaching the entry node; if the fpred had been selected, the algorithm selects the unexamined predecessor in the queue and sets to be examined.

In the third step, if the number of available processors is less than the number of required processors, the TANH algorithm performs the processor reduction procedure to merge clusters to reduce the number of clusters into the number of available processors.

The disadvantage of the TANH algorithm is that the number of required processors is more than the number of available processors generally; otherwise, the performance would be degraded. Therefore, the algorithm performs the processor reduction procedure to merge two clusters by the largest and the least values of summing the computation costs of tasks in the processor. The above procedures lead to the performance downward seriously.

2.3 The HNDP Algorithm

The HNDP algorithm (Baskiyar, & Dickinson, 2005) is extended from the Decisive Path Scheduling (DPS) algorithm (Park, shirazi, Marquis, & Choo, 1997). The DPS algorithm has shown the efficient for homogeneous environment. The HNDP algorithm improves the DPS algorithm to suit the heterogeneous environment, and it extends the assign phase from the list-based algorithm to the duplication-based algorithm.

The DPS algorithm starts at transforming an input DAG to a new DAG which has only one entry node and only one exit node. The transform can be done by adding a pseudo entry node and a pseudo exit node, such that their computation cost are zero. So DPS algorithm can identify the decisive paths to all the nodes of the new DAG.

The DPS algorithm first calculates the top and bottom distance for each node using the mean computation cost. The top distance is the longest distance between the entry node and the node excluding computation cost of the node, and the bottom distance is the longest distance between the node and the exit node include computation cost of the node.

The length of each node’s decisive path (DP) is the sum of the top distance and the bottom distance.

After building the DP for each node, the DPS algorithm will create a task_queue, whose sequence is the order to schedule the DAG. The DPS algorithm creates the task_queue starting with the DAG’s entry node and traversing the critical path (CP) to the exit node. In fact, the CP is a set of the nodes which have the largest DP from an entry

node. The critical path node (CPN) is the node on the CP. After all of the node’s predecessors have been added to the task_queue, the node can be added to the task_queue.

If the node has at least one predecessor which isn’t in the task_queue, DPS attempts to schedule all of the node’s predecessors into the task_queue in a top-down fashion.

In the assign phase, The HNPD algorithm is an insertion-based algorithm. It uses the insertion policy to find the slot time in processors. The HNPD algorithm selects PEs according by calculating the node’s earliest complete time of all nodes. Once the node has been assigned to the processor, the HNPD algorithm attempts to duplicate predecessors of the node to reduce the actual complete time of the task. The algorithm duplicates the node’s predecessor, if there is the slot time which is large enough to duplicate the favorite predecessor between the recently assigned task and the preceding task on the selected PE. The duplication procedure is performed repeatedly for each predecessor in the order from the most favorite task to the least task. After HNPD duplicating each predecessor of the node, to duplicate the duplicated tasks’ predecessors recursively until no further duplication is possible.

The advantage of the HNPD algorithm is that the duplication procedure and the assignment procedure use the insertion fashion. The HNPD algorithm’s duplication procedure can duplicate tasks as possibly to reduce the execution time of the DAG. The HNPD algorithm also doesn’t to consider that the critical path might be changed during assign tasks and duplication tasks to processors. The HNPD algorithm can’t obtain more information during the determined schedule order.

2.4 The HCPFD Algorithm

THE HCPFD algorithm (Hagras, & Janecek, 2004) is based on the CNPT list-based algorithm (Hagras, & Janecek, 2003) and it extends the CNPT algorithm. The HCPFD algorithm improves the assign phase of the CNPT algorithm which modify assignment phase from the list-based algorithm to the duplication-based algorithm.

There are two phases of the CNPT algorithm. In the first phase, the CNPT algorithm determines the schedule order of all tasks. In the second phase, the CNPT algorithm depends on the schedule order in the first phase to assign a task onto the PE with the minimal complete time.

In the listing phase, the CNPT algorithm separates all task of a DAG into a set of parent-trees. All the root-nodes of parent-trees are the nodes on the critical path which called the critical node (CN). The CNPT algorithm applies a empty queue and an auxiliary stack during scheduling. The algorithm starts at pushing CNs into a auxiliary stack by their ALSTs. If the top node of a stack has more one unscheduled parents, those parent nodes are pushed into the stack. Otherwise, the top node of a stack is popped, and it is queued into the queue of schedule order.

In the assign phase, the CNPT algorithm depends on the schedule order of the listing phase, to assign task onto PE which has the minimal complete time.

The HCPFD algorithm extends the CNPT algorithm in the assign phase for reducing the makespan. At each assign step, the HCPFD algorithm assigns the candidate task to a PE which has the minimal complete time, and then duplicates its critical parent node on

the time slot between the candidate task and the previous task on the same PE, if this slot time is enough. This duplication procedure can raise the candidate task’ complete time on the PE.

The HCPFD algorithm reduces the makespan effectively by the duplicating critical parent nodes approach. But it just bases on the list scheduling and make task duplication simply, it doesn’t neither consider that critical path which could change during assign phase nor check the slot time between the data arrival time of the duplication task and candidate one.

在文檔中 A Study of Exploiting Efficiency of the Duplication-based Algorithm in Distributed Heterogeneous Computing Systems (頁 22-30)