Chapter 3. Task Ranking and Task Group Allocation
3.3 Task Group Allocation for Online Multi-workflow Scheduling
This section presents our task group allocation approach for clustering-based multi-workflow scheduling, featuring two mechanisms, adjustable idle time gap selection and adaptive task group rearrangement, for improving overall multi-workflow execution performance.
Most previous clustering-based workflow scheduling approaches focused on how to cluster the tasks in a workflow into different task groups [4]. Although these task groups have to be allocated onto computing resources for execution after the clustering phase, few studies discuss the task group allocation issue. When scheduling workflows onto computing resources, because of inter-task dependency and data communication costs, there are idle time gaps formed between scheduled tasks on each resource. In [16] Stavrinides and Karatza proposed an approach to efficient utilization of the idle time gaps through bin packing techniques. Although, in their approach, the list scheduling heuristic is applied to determine the allocation sequence, the idea can be applied to other kinds of workflow scheduling approaches, too [17]. In the experiments in [16], the Best Fit (BF) principle was shown to achieve the best overall performance.
Although Best-Fit allocation has the potential to improve resource utilization, it might delay tasks’ start time and in turn degrade the performance of entire workflow because it skips some earlier available time gaps to find the fittest one. Therefore, in our task group allocation method, we adopt an adjustable dual-criteria idle time gap selection mechanism to further improve multiple-workflow scheduling performance through making a balance
between the task group’s finish time and the fitness of an idle time gap. The dual-criteria mechanism defines a score function for evaluating each idle time gap which is large enough to accommodate the task group to be allocated. The score of each time gap is calculated by summing up the Earliest Finish Time (EFT) of the task group, if allocated on the time gap, and the difference between the lengths of the time gap and the task group. The time gap with the smallest score will be chosen to allocate the task group.
Since the effectiveness of a task group allocation method might be influenced by several different factors, e.g. workflow characteristics and workload conditions, to make the proposed dual-criteria mechanism more flexible for different conditions we give an adjustable parameter in the score function for adjusting the relative weights of the two different attributes. The score function is defined as follows, where σ is an adjustable parameter ranging between 0 and 1, f is the evaluation of time gap fitness calculated by subtracting the required computation time of the entire task group tx from the period of the candidate time gap, and the function EFT( ) calculates the earliest finish time of tx if allocated on the candidate time gap.
( ) ( ) ( ) (1)
Figure 3.3 is an example comparing the pure BF principle and our dual-criteria idle time gap selection mechanism. There are three workflows to be scheduled as shown in Figure 3.3(a). Figure 3.3(b) is the schedule produced by the pure BF principle and Figure 3.3(c) is the result generated by our dual-criteria idle time gap selection mechanism, where σ is set to 0.5. Figure 3.3 shows that our dual-criteria mechanism improves the overall workflow execution performance in that the finish times of two workflows get earlier, from 104 to 51 and from 117 to 116, respectively, while the performance of the other one remains the same.
Therefore, the average makespan of all the three workflows is reduced from 111.6 to 93.6 as shown in Figure 3.3(d).
(a)
(b)
(c)
(b) (c)
average makespan 111.6 93.6
(d)
Fig. 3.3 Comparison of the pure Best Fit heuristic and our dual-criteria mechanism
The dual-criteria idle time gap selection mechanism discussed tries to allocate an entire task group into a single gap on a specific resource. However, clustering-based workflow scheduling approaches sometimes might lead to task groups too large to fit into any single idle time gap. This, if happening, would result in both degraded resource utilization and delayed task completion time. In the following, we propose an adaptive task group rearrangement mechanism to cooperate with the adjustable dual-criteria idle time gap selection mechanism for further improving the overall multi-workflow execution performance. In traditional clustering-based workflow scheduling approaches, the task groups are formed simply based on the workflow properties before the task allocation stage. Our adaptive task group rearrangement mechanism allows a task group to be split into several subgroups for independent allocation at the task allocation stage, in order to efficiently utilize resources and, in turn, improve the overall workflow execution performance.
Someone might question why not just adopting the list-based workflow scheduling approach instead of allowing a task group in the clustering-based approach to be split into subgroups. Figures 3.4 and 3.5 illustrate the potential advantage of our adaptive task group rearrangement mechanism. In Figure 3.4, each task is allocated independently as in the list-based workflow scheduling approaches, resulting in some unnecessary inter-task communication overheads. On the other hand, in our adaptive task group rearrangement mechanism, each task group will be cut into subgroups only when necessary at the task allocation stage. At each decomposition activity, an original task group is cut into two new subgroups. The first subgroup contains the largest number of tasks which can be fitted into the gap under consideration, and the other subgroup consists of the remaining tasks. The first subgroup will be allocated first and the second subgroup will be put back to the ready queue, waiting for later allocation decision. Since each subgroup would contain as many tasks as possible, the inter-task communication costs can be minimized. Figure 3.5 shows the potential advantage of our adaptive task group rearrangement mechanism. The largest subgroup which can be allocated is shown near each idle time gap. In Figure 3.5, finally all the four tasks will be allocated in gap C since that leads to the least EFT of the entire task group, resulting in better overall performance compared to Figure 3.4. This example shows that our adaptive task group rearrangement mechanism can retain the advantage of clustering-based workflow scheduling to the largest degree while providing additional flexibility for task allocation.
Figure.3.4 Individual task allocation in list-based workflow scheduling approaches
Figure.3.5 Advantage of adaptive task group rearrangement
The score function (1) is not appropriate when adopting adaptive task group rearrangement since not every gap can accommodate the entire task group and thus the EFT of the entire task group is not available. To overcome this difficulty, a new score function is defined as follows, where and are two adjustable parameters for controlling the relative weights of the three effects and + ranges between 0 and 1. For idle time gaps which are large enough to accommodate the entire task group, the last term of the score function (2) is zero and the entire score function will become identical to the score function (1).
( ) ( ( )(
(2)
The following provides an algorithmic description of our adaptive dual-criteria task group allocation approach. The algorithm evaluates each idle time gap in the system in turn according to the above score function in the two nested for loops between lines 1 and 17. Lines 3 to 5 deal with the case that the gap can accommodate the entire task group. Lines 6 to 9 handle the case that current gap is not large enough for the entire task group by cutting the task group into two subgroups for allocating the first subgroup first. Lines 11 to 15 are common to both cases for choosing the most appropriate gap. After the two nested loops, the best gap is found and the task group is split into two subgroups if necessary according to the gap size. The first subgroup is allocated onto the gap and the second subgroup will be put back into the ready queue for later allocation.
Algorithm: Adaptive Dual-Criteria Task Group Allocation Input:
Tr: total number of resources
nt: the number of tasks in the task group to be allocated ni: total number of gaps on resource i.
, : adjustable parameters and + ranges between 0 and 1 gapi(j): size of the jth gap on resource i.
gapi(j).end: the end time of the jth gap on resource i sizet: size (total computation cost) of the task group t
task_gapi(j).end: the expected finish time of the task group if allocated onto the jth gap on resource i without considering the gap size
Variables:
min: the lowest score found so far, initialized as ∞ tempmin: the temporary score of current gap
finali.end: the expected finish time of the task group if allocated onto the last task’s finish time on resource i
final : the infinite gap starting at the last task’s finish time on resource i
i : index of resource.
j : index of gap on resource i.
Output:
found_gap: the index of the gap for allocation
found_res: the index of the resource on which the gap is found k: index of the decomposition point of the task group to be allocated
1. for i= 1 to Tr do 2. for j = 1 to ni do
3. If (gapi(j) sizet and task_gapi(j).end gapi(j).end ) then
4. Tempmin = score according to formula (2) using and with the last term being zero and the first subgroup equal to the entire task group 5. k = nt
18. allocate the first subgroup into the jth gap on resource i, and put the second subgroup back to the ready queue