Scheduling task graphs, also known as workflows, on parallel computing platforms has long been an important research topic and is well known to be a NP-complete problem [1, 2].
With the advancement of technology and emergence of grid and cloud computing, now many large-scale scientific and engineering applications are usually constructed as workflows, whose structure can be represented by traditional parallel task graphs, due to large amounts of interrelated computation and communication [4]. Many open source workflow management systems, such as ASKALON [22], DAGman [23], Gridbus [24], Pegasus [25], have been developed to support workflow applications in parallel and distributed systems.
In most research works, workflows are represented by Directed Acyclic Graphs (DAG) for describing the inter-task precedence constraints [3]. In practice, most common workflow applications have more regular structures than the random synthetic DAGs usually studied in previous works. As discussed in [4], DAGs of fork-join control structures are a common type of underlying structures for many workflow applications. Therefore, in this thesis we focus on the scheduling issue of workflows containing fork-join DAG structure. Figure 1.1 is an example of such kind of workflow structure. Each node represents a task which can be realized by a specific software program and can be allocated to a processor for execution. The number next to each node indicates the required execution time of the task. The edges represent the dependence between tasks and the number next to an edge means the inter-task data transmission cost. Nodes like node B are called fork nodes and nodes like node F are called join nodes hereafter in this thesis. A workflow scheduler has to schedule and allocate each task according to the dependence specified in the workflow definition. There are
languages and middleware, such as BPEL [26] and Xavantes [27], developed for programming such kinds of workflow applications.
Figure.1.1: A workflow example of fork-join DAG structure
Many approaches have been proposed to deal with the challenging workflow scheduling problem in the literature [10, 28, 29, 30, 31, 32, 33, 34, 35]. Clustering-based methods are one of the major types of workflow scheduling approaches and have the advantage of minimizing inter-task communication costs, which makes them superior to other kinds of methods in many cases. Due to the complexity, most previous workflow scheduling research focused on scheduling a single workflow on parallel systems [10, 28, 29, 30, 31, 32]. However, as modern high-performance computing platforms, such as grid and cloud, become prevalent, many users would run their workflow applications simultaneously on the same platform. It becomes an inevitable issue to schedule multiple concurrent workflows efficiently. In addition, although most previous researches on clustering-based workflow scheduling focused on the task clustering issue, recent research [18] showed that task allocation utilizing idle time gaps between scheduled tasks is a promising direction for efficient multiple workflow scheduling.
In general, workflow scheduling consists of two major steps: task ranking and task allocation. In the task-ranking step, the scheduler assigns a rank value to each task according to a specific mechanism, e.g., the bottom level (rank) of each task [12]. In the task allocation step, the scheduler continuously allocates each task or each task group, according to its
priority, to appropriate resources for execution with a specific resource selection mechanism, e.g., Earliest Finish Time (EFT) principle [10, 12]. Different scheduling heuristics differ in these two steps and leads to different schedules. For clustering-based approaches, there is an additional step between the above two steps, called task clustering, aiming to cluster several inter-related tasks into a group before allocation for effectively minimizing the inter-task communication costs. In the first part of this thesis, we propose and evaluate new task-ranking methods for clustering-based workflow scheduling.
Path Clustering Heuristic (PCH) is a typical clustering-based workflow scheduling approach and was developed for dealing with fork-join based workflows specifically, which has been shown effective in [4, 5]. As a clustering-based method, PCH partitions a workflow into several task groups first, and then allocates these task groups onto processors for execution. The former part of clustering-based methods concerns task clustering and has received much research attention previously. On the other hand, little attention has been paid on the later part of clustering-based methods which deals with task group allocation. In the second part of this thesis, we present a new task group allocation method for further improving the performance of clustering-based workflow scheduling.
Most previous task allocation approaches adopted simple heuristics which focused on a single principle, e.g. best resource fitness or Earliest Finish Time (EFT). In the third part of this thesis, we study the issue of task group allocation for clustering-based multi-workflow scheduling and make contributions including proposing an efficient dual-criteria task group allocation method and analyzing the relative advantage of the best-fit and EFT principles across different workload conditions and workflow properties. Our method uses a mechanism which considers both resource fitness and tasks’ EFT when allocating task groups and can adjust the
weights of different principles for adapting to different situations. In addition, an adaptive task group rearrangement mechanism is adopted in our method. These two mechanisms together enable our method to improve the overall multi-workflow execution performance effectively.
The rest of this thesis is organized as follows. Chapter 2 discusses related works on workflow scheduling. In chapter 3, we present our ranking method, task group allocation method for a single workflow, and task group allocation method for multi-workflow scheduling. Chapter 4 presents the simulation experiments and discusses the results of performance evaluation. Chapter 5 concludes this thesis.