Experimental Setting - Experiments and Performance Evaluation

Chapter 6. Experiments and Performance Evaluation

6.1 Experimental Setting

This section describes the experimental settings in our simulation studies, including DAG generation, and performance metrics. The simulation experiments were conducted on a PC equipped with a 2.6 GHz AMD Athlon(tm) dual core processor and 1.87GB RAM.

We implemented a DAG generator to randomly generate workflows of fork-join DAG structure or general DAG structure for the following simulation experiments. The fork-join DAGs are generated as follows:

1. The generator generates a DAG with one entry node and one exit node.

2. Each DAG contains one to four fork-join structures randomly.

3. Each fork operation produces two to ten branches randomly.

4. Each branch contains two to six nodes randomly.

5. It can generate DAG’s with different CCR values: 0.1, 1, and 10.

General DAGs are generated as follows.

1. The generator randomly generates a DAG with four to five levels of tasks.

2. Each task level contains two to five nodes randomly.

3. Each task is randomly connected to some of the tasks at the next level.

In the following experiments,

1. To simulate a speed-heterogeneous system, there are three different computing speeds, with the ratio 1:2:3, for the resources in the system.

2. Each node has the computation cost ranging from 1 to 30 seconds.

3. Each edge is assigned a communication cost based on the information of the computation costs and CCR of the entire workflow.

4. Each experiment was conducted for 30 times and the average performance value was calculated.

We use the average makespan of all workflows as the performance metric in the following experiments, where the makespan is defined to be the time between submission and completion of a workflow, including execution time and waiting time.

6.2 Task Ranking and Allocation in List-Based Workflow Scheduling

This section evaluates the proposed task ranking and allocation methods for list-based workflow scheduling. The proposed methods are compared with HEFT [16] and the lookahead variant of HEFT [5]. Figures 6.1 and 6.2 show the experimental results of different task ranking methods for general DAG’s and fork-join DAG’s, respectively, on three resources.

To simulate speed heterogeneity, for each task different computation costs will be generated randomly for different resources. Figure 6.1 indicates that the proposed top + bottom ranking method outperform the bottom ranking approach used in HEFT [16].Figure 6.2 reflects two points. Firstly, it demonstrates that the top + bottom task ranking method does not work well with fork-join DAG’s, as illustrated in chapter 3. Secondly, our bottom-amount task ranking method is superior to the bottom ranking approach used in HEFT [16] for fork-join DAG’s.

Figure 6.1: Different task ranking methods for general DAG’s (HEFT)

Figure 6.2: Different task ranking methods for fork-join DAG’s (HEFT)

Figures 6.3 and 6.4 compare the performance of different task allocation methods on three resources. The experimental results show that our FST achieves better performance, compared to the EFT principle used in HEFT [16], for both general and fork-join DAG’s. Moreover, the performance improvement by FST is more significant when applied to fork-join DAG’s. For fork-join DAG’s, 74% of the randomly generated DAG’s can achieve better performance using FST instead of EFT, while only 51% of general DAG’s can benefit from FST.

Figure 6.3: FST vs. EFT for General DAG’s Figure 6.4: FST vs. EFT for fork-join DAG’s

However, Figure 6.5 shows an example which indicates that FST might not be as effective as in the above experiments for a more lightly loaded system. In Figure 6.5, there are four schedules resulting from scheduling the example fork-join workflow onto three or six resources using EFT or FST allocation methods, respectively. It’s clear that for the cases of three resources, FST leads to a better schedule than EFT. This is because that for the tasks on the critical path indicated by red line, {1, 2, 9, 13, 14, 15, 16, 17}, both the schedules produced by EFT and FST incur two communication costs, while the schedule of FST allows those tasks to run on the fastest resources, leading to a shorter makespan. On the other hand, the situation changes for the cases of six resources. Since FST does not consider communication costs when making allocation decisions, it would have higher probability of allocating tasks onto different resources, incurring communication costs. In the cases of Figures 6.5 (d) and (e), the tasks on the critical path, {1, 2, 9, 13, 14, 15, 16, 17}, incur three communication costs in the schedule of EFT, but lead to five communication costs in the schedule of FST. The higher incurred communication costs compromise FST’s benefits of allocating tasks onto the fastest resources, resulting in a worse makespan. In a shared parallel computing environment, such as grid and cloud, every user or each application can only acquire an uncertain portion of resources depending on the system load and resource status at

that time. The above observation points out that the best task allocation choice might depend on the system load and the number of resources acquired. Therefore, task allocation for workflow scheduling becomes even more challenging in such shared parallel computing environments and requires further research efforts.

(a)

(b) (c)

Figures6.6 evaluates the effects of number of branches in fork-join DAG’s on the performance of FST. The experimental results show that the performance improvement achieved by FST increases as the number of branches grows. Figures 6.7evaluates the effects of branch length in fork-join DAG’s on the performance of FST. The experimental results indicate that the performance improvement is more significant with longer branch lengths.

Figures 6.8 is an example for illustrating the effects of number of branches in fork-join DAG’s. The tables in the figure show the computation costs of tasks on different resources.

Figure 6.8 (a) shows the schedules of a fork-join workflow with 2 branches, produced by the EFT and FST task allocation methods, respectively, while Figure 6.8 (b) is a comparative example with a fork-join workflow of three branches. In Figure 6.8 (a), it is clear that EFT tends to produce a schedule with less degrees of concurrency, all tasks being allocated on R2.

In contrast, FST allocates tasks 2 and 3, which are on different branches, onto different resources, resulting in a higher degree of concurrency, and thus a shorter makespan. In Figure 6.8 (b), as the number of branches increases, FST produce a schedule with an even higher degree of concurrency than that in Figure 6.8 (a). Since a higher degree of concurrency has potential to achieve shorter makespan, this can explain why in general FST leads to larger performance improvement as the number of branches increases.

Figures 6.9 is an example for illustrating the effects of branch length in fork-join DAG’s.

We simply use a single branch in this example for illustration instead of an entire fork-join workflow. For the example workflow in Figure 6.9, task 1 can run fastest on resource R1, while other tasks run fastest on resource R2. Both task allocation methods, EFT and FST, allocate task 1 on R1. Since the communication costs are larger than the difference of computation costs among resources in this case, EFT tends to allocate all other tasks on the same resource, R1, for

minimizing the total effects of communication and computation costs. However, this would lead to a worse situation where all the other tasks except task 1 run on a slower resource, result to a loner makespan. On the other hand, FST simply allocates a task on the fastest resource for it, allowing tasks 2, 3, and 4 to run on the fastest resource and leading to a shorter makespan.

Since the affected number of tasks is proportional to the branch length, this can explain why in general FST achieves larger performance improvement as branch length increases.

Figure 6.6: Effects of different numbers of branches in fork-join DAG’s.

Figure 6.7: Effects of branch length in fork-join DAG’s.

Figure 6.8: Effects of different numbers of branches in fork-join DAG’s

Figure 6.9: Effects of branch length in fork-join DAG’s.

Figures 6.10, 6.11, 6.12, and 6.13evaluate the total performance improvement achieved by integrating the proposed task ranking and allocation methods. The integrated approach is compared with HEFT [16] and the lookahead variant of HEFT [5]. Figures 6.10 and 6.11 presented the performance evaluation for general DAG’s and fork-join DAG’s, respectively.

The experimental results show that our approaches outperform both existing methods. The performance improvement for fork-join DAG’s is more significant. Figures 6.12 and 6.13 evaluate our approach with two well-known real-world workflow applications, Montage [7]

and LIGO [2], respectively. The workflow structures of these two real-world applications are shown in Figures 14 and 15, respectively. Experimental results indicate that our integrated approach can achieve better performance, compared to existing approaches, for real-world workflow applications. In summary, our integrated approach can achieve up to 11.8%

performance improvement.

Figure 6.10: Evaluation of the integrated approach for General DAG’s

Figure 6.11: Evaluation of the integrated approach for fork-join DAG’s

Figure 6.12: Evaluation of the integrated approach with Montage

Figure 6.13: Evaluation of the integrated approach with LIGO

Figure 6.14: Montage Figure 6.15: LIGO

6.3 Task Group Allocation in Clustering-Based Multiple

在文檔中平行計算台上工作流程排程問題中資源配置方法之研究 (頁 50-59)