Motivation - 於異質運算系統中有效開發計算資源之策略

In recent decades, the distributed computing system has rapidly become a standard platform for high-performance and large-scale computing. This is due to its low-cost and high-performance for the computational requirements of a large number of parallel applications, such as the weather prediction, the genome exploration, the distributed database systems, the item response theory (IRT), and the fluid dynamics (Skordos, 1995).

Such a platform usually consists of a geographically distributed suite of machines with different computing capabilities interconnected by diverse high-speed links with various network topologies and these distributed computational resources are used in a coordinated way.

However, with the increasing of the computational requirements of these parallel applications, the going systems, for convenience sake, usually just use off-the-shelf hardware components to expand their processing capabilities. As a consequence, the system heterogeneities in these platforms will become more obvious. In a general way, the heterogeneities between computing capabilities and these of communicating capabilities between geographically distributed machines exist in such systems.

Therefore, we name these heterogeneous systems as the distributed heterogeneous computing (DHC) systems.

In these DHC environments, machines communicate with each other solely by message passing. The message-passing overhead is quite large, typically in excess

of 500 instruction cycles (Dally, 1990). As a result, this could impose some penalty on these task granularities if these parallel applications exist several fine grain tasks.

The popularity of these DHC systems is derived from their ability to integrate geographically distributed diverse machines to exploit efficiently the utilization of these heterogeneous computing capabilities.

Even though these systems meet the computational requirements of a large number of parallel applications, they also evoke a series of problems that are not encountered during these parallel applications scheduled in a single machine, such as designing parallel algorithms for these parallel applications, partitioning these parallel applications into tasks, synchronizing between the tasks, and scheduling the tasks onto these DHC systems. A great number of research efforts addressing these problems have been proposed in the literature. Task-scheduling is an extremely important topic since an improper scheduling of tasks can fail to exploit efficiently these computational resources in the DHC systems. Therefore, the efficient task-scheduling in such systems has become more important.

Generally speaking, a scheduling problem exists in two communities: the dynamic scheduling and the static scheduling. In the dynamic scheduling community, the parallel programs are usually deadline-critical and they explicitly must be scheduled in run time. If the deadline constraints are not satisfied, there will be very serious consequences. These deadline-critical parallel programs exist in areas of the real-time multimedia processing, the real-time image processing (RTIP), and the geographic information systems (GIS).

Due to these parallel applications with the cut-and-dried structure characteristics such as the LU Decomposition (LU), the Gaussian Elimination (GE), the Fast Fourier Transformation (FFT), and the Laplace Partial-Difference (Cosnard, Marrakchi, Robert & Trystam, 1988; Golub & Loan, 1996; Lord, Kowalik & Kumar, 1983;

Skordos, 1995; Wu & Gajski, 1990) exist in various practical problems, we can schedule these parallel applications modeled as task graphs in compile time and then these scheduling results can also be reused to reduce the total execution times of these various practical problems in run time. In this study, we focus the problem on the static scheduling community.

In order to exploit efficiently the utilization of the diverse resources in a DHC system, a parallel program is usually modeled as a weighted directed acyclic graph (DAG) (Ahmad & Kwok, 1994; Chung & Ranka, 1992; Park, Shirazi & Marquis, 1997; Topcuoglu, Hariri & Wu, 1999), in which each task represents the indivisible task unit and each edge represents the inter-task data dependency, such that the task with more remaining time from it to the exit task (defined in next chapter) of a task graph in a DHC system could be allowed to allocate to the best-suited machine as early as possible. The optimal solution to this problem will be achieved if we can accurately estimate all task remaining times and then assign all tasks in a decreasing remaining-time order to their individual best-suited machines in a DHC system.

In the formal description of our objectives to the static scheduling problem are to find a scheduling sequence of all tasks so that the task precedence constraints in a parallel application are satisfied and to find a mapping of all tasks onto a machine set and then a minimum schedule length is obtained (Beaumont, Boudet & Robert, 2002;

Chung & Ranka, 1992; Park, Shirazi & Marquis, 1997; Topcuoglu, Hariri & Wu, 1999; Zhao & Sakellariou, 2003).

Since it has been shown that the task-scheduling problem is NP-complete (Coffman, 1976; Garey & Johnson, 1990; Graham, Lawler, Lenstra & Rinnooy Kan, 1979; Lewis & El-Rewini, 1992; Sethi, 1976; Ullman, 1975), research efforts have proposed numerous scheduling algorithms based on heuristics. These algorithms are usually classified into a variety of categories, such as the priority-based list

scheduling algorithm (Beaumont, Boudet & Robert, 2002; Oh & Ha, 1996;

Maheswaran & Siegel, 1998; Radulescu, van Gemund & Lin, 1999; Sih & Lee, 1993;

Topcuoglu, Hariri & Wu, 1999), the clustering algorithm (Liou & Palis, 1997; Palis, Liou & Wei, 1996; Sarkar, 1987; Yang & Gerasoulis, 1994; Zomaya & Chan, 2004), the task duplication algorithm (Ahmad & Kwok, 1994; Chung & Ranka, 1992; Dogan

& Ozguner, 2002; Hagras & Janecek, 2004; Ilavarasan & Thambidurai, 2005; Park, Shirazi & Marquis, 1997), and the genetic algorithm (Wang, Siegel & Roychowdhury, 1996). However, the priority-based list scheduling algorithm is widely accepted since it pairs low complexity with good results (Kwok & Ahmad, 1998; Radulescu &

van Gemund, 1999). These various task-prioritized schemes can be explored in (Kwok & Ahmad, 1999; Zhao & Sakellariou, 2004) and they have also been proposed to obtain better scheduling performance.

There are several proposed priority-based list scheduling algorithms, such as the Modified Critical Path (MCP) algorithm (Wu & Gajski, 1990), the Fast Critical Path (FCP) algorithm (Radulescu & van Gemund, 1999), the Best Imaginary Level (BIL) algorithm (Oh & Ha, 1996), the Iso-Level Heterogeneous Allocation (ILHA) algorithm (Beaumont, Boudet & Robert, 2002), the Partial Completion Time algorithm (PCT) (Maheswaran & Siegel, 1998), the generalized Dynamic Level Scheduling (DLS) algorithm (Sih & Lee, 1993), the Heterogeneous Earliest Finish Time (HEFT) algorithm (Topcuoglu, Hariri & Wu, 1999), and Critical Path on a Processor (CPOP) algorithm (Topcuoglu, Hariri & Wu, 1999), solving the task scheduling problems in DHC systems.

Among these above-mentioned algorithms for heterogeneous machines, the Heterogeneous Earliest Finish Time (HEFT) algorithm (Topcuoglu, Hariri & Wu, 2002), a natural extension of the traditional priority-based list scheduling algorithms for homogeneous machines to deal with system heterogeneities, has been revealed to

produce further frequently shorter schedule lengths than other comparable algorithms (Beaumont, Boudet & Robert, 2002; Topcuoglu, Hariri & Wu, 1999). However, the HEFT algorithm has not examined an alternative scheme about calculating task priorities. Additionally, in order to simplify the design of the HEFT algorithm, it assumes that there is no communicational contention. Such an assumption is not realistic in DHC systems.

In this study, we employ an alternative task-prioritized scheme in stead of that used by the HEFT algorithm. Conveniently, we name the HEFT algorithm with the alternative task-prioritized scheme as the Strict Bound Completion Time (SBCT) algorithm. In order to further improve the scheduling performance of the SBCT algorithm, we propose another priority-based list scheduling algorithm with dynamic task-prioritized scheme, which is named the Enhanced Strict Bound Completion Time (ESBCT) algorithm. Additionally, we also exploit available idle time slots in every scheduling round of the ESBCT algorithm. Therefore, the proposed two algorithms could obtain better scheduling performance in comparison with existing algorithms.

In terms of the quality of solutions generated, the ESBCT algorithm is usually superior to the SBCT algorithm. However, the ESBCT algorithm has a higher time complexity O(v³ × q) for a task graph with v tasks and a DHC system with q machines.

The main source of time complexity in the ESBCT algorithm is the method used to tune the priorities of unscheduled ready tasks allocated at every machine of a DHC system.

In order to show that the SBCT and ESBCT algorithms outperform previous approaches, the benchmarks, (e.g., the LU Decomposition (LU), the Gaussian Elimination (GE), the Fast Fourier Transformation (FFT), and the randomly generated task graphs), are used to evaluate the given performance criterion, minimum schedule length.

At last, two existing algorithms from the literature are reviewed for comparison with the proposed two algorithms. These experimental results are shown to validate the previous performance claim.

在文檔中於異質運算系統中有效開發計算資源之策略 (頁 24-29)