Objectives - A Study of Exploiting Efficiency of the Duplication-based Algorithm in Distributed

Chapter 1. Introduction

1.3 Objectives

In the distributed heterogeneous computing environment, an application could be partitioned into a set of tasks, these tasks could be assigned to a set of PEs and arrange their orders to be executed.

The task scheduling problem in the homogeneous computing system is different from that in the heterogeneous computing system. For example, the scheduling algorithms usually distribute even workloads to each computing component in the distributed homogeneous computing system; however, the even-distributed workloads will lead to different completion times in different computing components with various computing capabilities. The computing components with the higher processing capability would wait for the ones with the lower processing capability to finish their tasks’ execution. So the application completing time would be restricted by the computing components with the lower processing capability. This study proposes a novel scheduling algorithm to conform to the system heterogeneity. Therefore, the following issues should be considered:

a. The task which could be issued earlier may not get the earliest completion time.

b. There are various parallelisms within the parallel application.

c. It could prompt the utilization of the distributed heterogeneous computing systems to achieve the system’s scalability.

For those reasons, this study will propose a novel algorithm to satisfy the above goals, and to gain the shortest execution time under the suitable system utilization.

In general, a parallel application could be modeled as a directed acyclic graph (DAG). In a DAG, a task is presented as a node, and an edge is presented as the inter-task communication relation and the data dependency between tasks. The weight of a node presents the task processing volume, and the weight of a edge presents the inter-task communication volume between tasks. In this study, we assume that the tasks of a DAG are non-preemptive; in other words, a task executed after other scheduled tasks executed completely on a PE in sequential.

This study proposes a duplication-based scheduling algorithm of the static scheduling algorithm strategy. Applying this algorithm to assign tasks to a system and to schedule these tasks to be executed should consider diverse resource requirements to gain the minimum completion time(Braun, Siegel, Beck, Boloni, Maheswaran, Reuther, Robertson, Theys, Yao, Hengsen, & Freund, 1999; Kwok, & Ahmad, 1999). To exploit the heterogeneous computational resource, many researches have focused on solving the NP-complete problem (Ullman, 1975; Garey, & Johnson, 1979) of efficiently scheduling tasks to DHC systems to obtain the near optimal solutions within an acceptable time complexity.

Generally, these static scheduling algorithms could be broadly classified into a variety of categories, such as the list-based algorithm, the clustering-based algorithm and the duplication-based algorithm. The list-based scheduling algorithm (Ranaweera, &

Agrawal, 2000; Sih, & Lee, 1993; Park, shirazi, Marquis, & Choo, 1997; Hagras, &

Janecek, 2003; Hagras, & Janecek, 2004; Kwok, & Ahmad, 2000) is a classical

scheduling heuristics. The first step of the list-based algorithm, the algorithms sets priorities to all tasks, the priority could be calculated in the static approach or the dynamic approach. By the priorities of the tasks, the algorithm can arrange the schedule order. In the final step, the algorithm allocates the task to the suitable PE repeatedly.

The impact of the scheduling is determined by the selection step. So that, the list-based scheduling algorithm is generally an attractive approach in terms of low complexity and high performance. The main drawback of the list-based algorithm is that each task is scheduled without sufficient information of subsequent tasks; and hence, the priority assignment does not always lead to the optimal task scheduling.

The clustering-based approach (Gerasoulis, & Yang, 1992; Palis, Liou, & Wei, 1996;

Pande, Agrawal, & Mauney, 1994; Pande, Agrawal, & Mauney, 1995) is generally known as a two-phase scheduling approach. The principal idea is trying to reduce the communication cost between inter-tasks to obtain the minimal completion time. For the above purpose, the clustering-based algorithm allocates heavily communicating tasks onto the same PE to reduce the overall communication cost. The clustering-based algorithm includes two phases: the first phase groups tasks into an unbounded number of clusters by clustering heuristics, and then the second phase allocates these clusters onto the PEs by load-balancing heuristics. If the available processors are fewer than the number of the clusters, the algorithm must appropriately merge clusters to fit the number of the available PEs (Ranaweera, & Agrawal, 2000). Although the complexity of clustering-based algorithms is generally lower than that of the list-based ones; the

scheduling performance of clustering-based algorithms is still worse than that of the list-based ones.

The duplication-based approach (Ahmad, & Kwok, 1998; Bajaj, & Agrawal, 2004;

Hagras, & Janecek, 2004; Chung, & Ranka, 1992) is another attractive manner for reducing the schedule length. The main idea of this approach is to exploit schedule holes by duplicating the parents of the candidate nodes to more than one PE. The candidate’s starting and finishing time could be reduced by decreasing the communication overhead. The defects of the duplication-based algorithm include the redundant resource consumption and the high complexity. The redundant resource consumption may influence the final makespan. In general, the duplication-based approaches exhibit their superiority over the list-based and clustering-based ones. The proposed duplication-based approach is to improve the list-based algorithm to achieve the lower complexity and the higher performance.

There are several approaches to raise the performance in the DHC system, such as improving PEs’ performance, increasing networks bandwidth, and more efficiently exploiting computing resource to advance performance. This study focuses on efficient exploiting computing resources in the DHC environment. It could apply the task duplication approach to reduce the communication overhead between inter-tasks and correcting the priorities dynamically to obtain a more suitable schedule order. As a result, this study extends the classical duplication-based algorithm and it could correct priorities of tasks dynamically in the DHC environment.

The remainder of this thesis is organized as follows. Chapter 2 introduces the newly static duplication-based heuristics. Chapter 3 presents the proposed algorithm.

Chapter 4 describes the simulation environment. Experimental results and performance analyses are provided in Chapter 5. Concluding remarks and future works are then offered in Chapter 6.

在文檔中 A Study of Exploiting Efficiency of the Duplication-based Algorithm in Distributed Heterogeneous Computing Systems (頁 13-18)