This chapter evaluates the proposed methods in our MOWS and compares them with the approaches used in OWM [12]. Section 5-1 introduces the setup for the following experiments and the metrics used in the performance analysis. Section 5-2 presents the experimental results of the proposed methods in MOWS.
5-1 Experimental Setup and Performance Metrics
5-1-1. Algorithms under Evaluation
In addition to the overall effects of MOWS, we also evaluated the effectiveness of each proposed method in it separately in the following experiments. Therefore, we implemented various online workflow scheduling approaches which differ with each other in the methods used in the four scheduling phases. The following describes the implemented approaches and the corresponding methods used in the four scheduling phases. :
OWM: adopting CPWS, RANK_HYBD, FCFS, and AA in the four scheduling phases, respectively.
OWM(SWF): replacing RANK_HYBD with SWF in the phase of waiting queue scheduling, used to evaluate the effectiveness of the SWF strategy through comparing it with OWM.
OWM(backfilling): replacing FCFS with priority-based backfilling in the phase of task rearrangement, used to evaluate the effectiveness of the priority-based backfilling strategy through comparing it with OWM.
OWM(preemptive): replacing CPWS with SWS in the phase of task
49
prioritizing and adding preemptive task execution into the phase of task allocation, used to evaluate the effectiveness of the preemptive task execution strategy through comparing it with OWM.
OWM(All-EFT): replacing AA with All-EFT in the phase of task allocation, used to evaluate the effectiveness of the All-EFT strategy through comparing it with OWM.
MOWS: adopting SWS, SWF, priority-based backfilling, preemptive task execution, and All-EFT in the four scheduling phases, respectively, used to evaluate the overall effect of MOWS.
5-1-2. Simulation Setup
In a real HPC environment, the workload may consist of workflows with various characteristics. To generate realistic workloads for the simulation experiments, we use the following parameters to generate different types of workflows. Chapter 4 has described how these parameters were used to generate DAGs. The following presents the ranges of values assigned to the parameters for DAG generation in the simulation experiments.
Node: the number of nodes in a DAG. It is randomly chosen from the set {20, 40, 60, 80, 100}.
Shape: a number controlling the shape of a DAG. A higher shape value results in a shorter DAG with a high parallelism degree. Otherwise, a longer DAG with a low parallelism degree is generated. Shape is randomly selected from the set {0.5, 1.0, 2.0}.
OutDegree: the maximum number of immediate descendants of a task.
OutDegree is randomly selected from the set {1, 2, 3, 4, 5}.
CCR: the Communication-to-Computation Ratio of a DAG. CCR of a
50
workflow is defined as its average communication cost divided by its average computation cost among all tasks on all resources. A data-intensive application has a higher CCR, while a compute-intensive one has a lower CCR. For general workflows, CCR is randomly chosen from the set {0.1, 0.5, 1.0, 1.5, 2.0}. For data-intensive workflows, CCR is selected from the set {1.5, 2.0}, and for compute-intensive workflows, CCR is selected from the set (0.1, 0.5).
BRange: distribution range of computation costs of tasks on different clusters. It is the heterogeneous factor for cluster speeds. A large range indicates significant differences in task‟s computation costs on different clusters. BRange is randomly selected from the set {0.1, 0.25, 0.5, 0.75, 1.0}.
WDAG: the average computation cost of a DAG. WDAG is randomly chosen from the range [100, 1000]. The average computation cost of each task on all clusters is randomly generated from a uniform distribution within the range [1, 2 * WDAG].
The submission interval between two consecutive workflows is assumed to conform to the Poisson distribution. Each experiment invokes 20 runs, of which each simulates 100 online workflows on a multi-cluster environment composed of 5 clusters each containing 50 ~ 70 processers respectively.
5-1-3. Metrics
The performance metrics used in the experiments are described below. In each experiment, the average values of all workflows based on these three metrics are used to evaluate the proposed methods.
51
makespan: the total execution time for a workflow application from workflow submission to workflow completion, including waiting time and execution time. It is used to measure the performance of a scheduling algorithm from the perspective of workflow applications. However, makespan usually varies widely among workflows with different sizes and other properties.
Schedule Length Ratio (SLR): the ratio of a workflow‟s makespan over its best possible schedule length. SLR tries to measure the performance of scheduling algorithms regardless of the variation in workflows‟ sizes and is defined by
, where CPL represents the Critical Path Length of a workflow.
5-2 Experimental Results
To evaluate the effectiveness of the proposed methods, we compare them with the approaches in OWM [12]. We vary the computation intensity and the arrival interval of workflows to investigate their influence on the performance of the proposed approaches. In the last section, we experiment with the effects of execution time estimation.
5-2-1. Shortest-Workflow-First Strategy
Figure 5-1 and Figure 5-2 show the performance results of OWM and OWM(SWF) under different mean arrival intervals of workflows in terms of average makespan and average SLR, respectively. It can be easily seen that OWM(SWF) has better performance than OWM in terms of average makespan. Figure 5-3 and Figure
52
5-4 present the performance of OWM and OWM(SWF) with different levels of computation intensity. In this experiment, the arrival interval of workflows is set to conform to the Poisson distribution with the mean value of 100. Under such setting of arrival interval, several workflows may be simultaneously running in the system. The results indicate that OWM(SWF) outperforms OWM significantly for both computation- and communication-intensive workflows in terms of average makespan.
However, in the above experiments, when in terms of SLR the performance of OWM(SWF) is either quite close to or even worse than that of OWM, as shown in Figure 5-4 and Figure 5-2, respectively. It is because the definition of SLR divides the makespan of a workflow by its critical path length. For those workflows with large parallel degree but short critical path length, our SWF approach treats them as large workflows, according to the calculation of estimated remaining execution time described in Figure 3-2, and thus assigns them low priority values. This arrangement would enlarge the makespans of those workflows and in turn lead to drastic increase in the SLR values because of their short critical path lengths. Therefore, based on the concerns of users, makespan or SLR, the scheduling system can choose to use either OWM‟s CPWS or our SWF approach.
53
Figure 5-1 Makespan performance of SWF with different mean arrival intervals
Figure 5-2 SLR performance of SWF with different mean arrival intervals
54
Figure 5-3 Makespan performance of SWF with different computation intensities
Figure 5-4 SLR performance of SWF with different computation intensities
5-2-2. Priority-based Backfilling
Figure 5-5 and Figure 5-6 investigate the performance of OWM and OWM(backfilling) under different mean arrival intervals of workflows. Figure 5-8
55
and Figure 5-9 evaluate the performance of OWM and OWM(backfilling) with workflows of different computation intensities. The experiments show that OWM(backfilling) outperforms OWM in terms of both average makespan and average SLR. In terms of average makespan, the performance improvement of OWM(backfilling) over OWM increases from 7% to 10% as the arrival interval grows.
Figure 5-7 shows the numbers of backfilling occurring in the experiments, which reflects that backfilling has more chance to occur when the system is more crowded since under such situation the tasks in queue are more likely to be blocked due to the insufficiency of available resources. However, comparing Figure 5-5 and Figure 5-7, more backfilling occurrences does not necessarily lead to more performance improvement. This is because earlier execution of some tasks in a workflow does not always reduce its makespan if the start times of the tasks on the critical path remain unchanged. For computation intensity, OWM(backfilling) outperforms OWM for both computation- and communication-intensive workflows. The above results indicate that task rearrangement can effectively improve the scheduling performance for mixed-parallel online workflows.
56
Figure 5-5 Makespan performance of backfilling with different mean arrival intervals
Figure 5-6 SLR performance of backfilling with different mean arrival intervals
57
Figure 5-7 number of backfilling happened v.s. mean arrival intervals
Figure 5-8 Makespan performance of backfilling with different computation intensities
58
Figure 5-9 SLR performance of backfilling with different computation intensities
5-2-3. Preemptive Task Execution
Figure 5-10 and Figure 5-11 evaluate the performance of preemptive task execution under different mean arrival intervals of workflows in terms of average makespan and average SLR, respectively. The average makespan produced by OWM(preemptive) is about 2% less than that produced by OWM. This is achieved by the advantage of preemptive task execution, as illustrated in Figure 5-12 which shows the numbers of preemption occurring in the experiments. Figure 5-12 indicates that preemption is more likely to occur when the system is less crowded since under such situation low priority tasks in queue have more chance to start execution first and are preempted later when high priority tasks come into the queue. The trend of preemption occurrences also explains the results in Figure 5-10 and Figure 5-11 where the performance improvement increases noticeably as the arrival interval grows.
Figure 5-13 and Figure 5-14 show the performance for workflows of different computation intensities. Again, the performance of OWM(preemptive) is better than
59
that of OWM for both computation- and communication-intensive workflows.
Figure 5-10 Makespan performance of preemptive task execution with different mean arrival intervals
Figure 5-11 SLR performance of preemptive task execution with different mean arrival intervals
60
Figure 5-12 Number of preemption happened v.s. Arrival intervals
Figure 5-13 Makespan performance of preemptive task execution with different computation intensities
61
Figure 5-14 SLR performance of preemptive task execution with different computation intensities
5-2-4. All-EFT Task Allocation
Figure 5-15 and Figure 5-16 compare the performance of OWM(All-EFT) and OWM under different mean arrival intervals of workflows in terms of average makespan and average SLR, respectively. Figure 5-17 and Figure 5-18 evaluate the performance of OWM(All-EFT) and OWM for workflows of different computation intensities. The results indicate that OWM(All-EFT) performs slightly better than OWM.
62
Figure 5-15 Makespan performance of All-EFT with different mean arrival intervals
Figure 5-16 SLR performance of All-EFT with different mean arrival intervals
63
Figure 5-17 Makespan performance of All-EFT with different computation intensities
Figure 5-18 SLR performance of All-EFT with different computation intensities
5-2-5. Overall Improvement Made by MOWS
This section presents the overall performance improvement made by MOWS, compared to OWM [12]. The performance results of different mean arrival intervals
64
in terms of average makespan and average SLR are shown in Figure 5-19 and Figure 5-20, respectively. The results indicate that MOWS outperforms OWM significantly.
In average, the performance improvement of MOWS over OWM is approximately 16%. The average makespan of both MOWS and OWM decreases as the mean arrival interval of workflows grows. Figure 5-21 and Figure 5-22 show the performance at different levels of computation intensity. MOWS outperforms OWM for both computation- and communication-intensive workflows.
Figure 5-19 Makespan performance of MOWS with different mean arrival intervals
65
Figure 5-20 SLR performance of MOWS with different mean arrival intervals
Figure 5-21 Makespan performance of MOWS with different computation intensities
66
Figure 5-22 SLR performance of MOWS with different computation intensities
5-2-6. Influence of Inaccurate Execution Time Estimate
The execution time of each task in workflows is necessary information for the proposed workflow scheduling algorithms. However, for some applications the exact execution time of a task may be difficult to know before the execution completes.
Therefore, users have to provide execution time estimate for each task when submitting a workflow. This section presents experiments conducted to evaluate the effects of inaccurate execution time estimate on the performance of the proposed workflow scheduling approach. Figure 5-23 and Figure 5-24 show the performance results under different inaccuracy degrees in terms of average makespan and average SLR, respectively. In this experiment, arrival interval of workflows is set to 100 seconds. As used in [12], the simulator picks the actual execution time of a task randomly from the range:
, where et is the estimated execution time of the task. For example, when the
67
uncertainty is 300% and et of a task is 100, the actual execution time of the task is randomly picked from the range [1, 700]. It can be easily observed that MOWS outperforms the other approaches for the uncertainty levels from 100% to 500%. In average, the performance improvement ratio of MOWS over OWM is approximately 13%. The performances of all the experimented algorithms are decreased with the increase over uncertainty level in the same rate except OWM(SWF). The performance of OWM(SWF) decreases faster as the uncertainty level grows since OWM(SWF) heavily depends on the estimate information of tasks to prioritizing workflows in the scheduling process.
Figure 5-23 Results of inaccurate execution estimates for average makespan
68
Figure 5-24 Results of inaccurate execution estimates for average SLR
69