Chapter 4: Two-tier Project and Job Scheduling with Backfilling, Slack Factor and
4.4. Preemptive Waiting Queues for Jobs and Projects
This scheduling policy allows both job finish time and project departure time to be re-arranged after a project has been accepted only if the resultant delay does not exceed the maximum amount of delay defined by the slack factor. Furthermore, we extend the policy into two versions: single type of projects, and two types of projects.
In the former, all projects have the same priority, which means that new projects can delay any existing reservations up to the maximum number of allowable times. On the other hand, the latter policy takes priority into account such that only some high-priority projects can preempt the low-high-priority ones.
Single Type of Project
The difference between this policy and 2TFB is slack factor => > 0. As a result, both waiting projects and jobs could be preempted by the newly arrived ones.
The maximum amount of allowable delay for project is & ∙ =>, and the starting time of job , can be relaxed to a value in the time range b , , . + & ∙ => −
18
, c. This, in comparison with 2TFB, could increase the number of successfully backfilled jobs. Two-tier Flexible Backfilling algorithm with the slack factor => > 0 (2TFB-SF) can handle this scheduling policy. Besides, 2TFB-SF uses parameter C >
0 to control the number of preempted projects.
Figure 4: Pseudo code of Two-tier Flexible Backfilling algorithm Two Types of Projects
This scheduling policy considers a realistic scenario of a cloud environment where some projects are more important than the others. Therefore, they need to be severed as soon as possible. Here we just consider two types of projects in this scheduling policy, i.e., high-priority project and low-priority project. We design this
Algorithm 2: FGH_FJKL_dTKeJfTKPQORSJTTJUV( , =>, C)
1 Begin
2 for each , ∈ do
3 If ( , % & 2 EE . !W) then
4 start ,
5 else
# Find all the feasible backfilling times for the job ,
6 Z3 ← [! Z %\ !! &D3 E ( , )
7 Let =^_` be the current scheduling plan
8 for each [ ∈ Z3 do
9 , ← [
# Set up a resource reservation for job ,
10 .. 2 Y&( , )
# Check the availability of system resource and
# delay existing reservations in order to
# accommodate job , if necessary.
# Update the latest starting time for the resource reservation of
# job ,
20 ! , ← .` + ( & , ∙ =>) − ,
21 . C = 2 &D3 E ( , )
22 end for
23 End
19
Two-tier Priority Backfilling (2TPB) in order to fulfill the requirement of the scheduling policy such that high-priority projects are scheduled by 2TFB-SF while low-priority projects are scheduled by 2TSB. Moreover, the priority of a project, 0 ≤
≤ 1, is taken into account in recalculating the slack factor, => = (1 − ) ∙ =>, for each project.
The steps of 2TPB are briefed in Figure 5. If the newly arrived project is high-priority, it is scheduled by 2TFB-SF (line 4) with slack factor => recomputed for (line 3). Otherwise, it is scheduled by 2TSB (line 6). After that, we update the latest starting time ! , for each job , ∈ with recomputed slack factor => (line 8-10).
=> (line 8-10).
Figure 5: Pseudo code of Two-tier Priority Backfilling algorithm 4.5. Case study: An Example Run for 2TSB and 2TFB-SF
An example run for 2TSB and 2TFB-SF is illustrated in Figure 6. Note that there are two types of resources. Thus, we plot the schedules in two blocks. Table 3 depicts the inputs of these two algorithms, which are a sequence of four submitted projects and their parameters.
In Figure 6(a), the schedule for the first 4 jobs is the same for both algorithms at time slot 2. Job h,5 is backfilled successfully at time slot 2 by both algorithms. The difference appears when job h,i is scheduled. In 2TSB, a potential time slot for job
h,i to run is time slot 3, but doing so would delay the reservation of job i,5, which is not allowed by 2TSB. Therefore, job h,i is scheduled at time slot 7. On the other hand, job h,i could begin to execute at time slot 3 under 2TFB. Doing so would delay Algorithm 3: FGH_FJKL_jLJHLJNkPQORSJTTJUV( , =>, C)
1 Begin
20
job i,5’s execution from time slot 6 to time slot 7, which is acceptable because the latest starting time of job i,5 is 7.2 (! i,5 = 7.0 + 6.0 ∙ 0.2 − 1.0 = 7.2).
When job p,5 arrives, it is scheduled by those two algorithms as shown in Figure 6(b). Time slot 6 is a potential flexible backfilling time for the job.
Unfortunately, this choice will delay job i,5 beyond its latest starting time (see Figure 6(c)).
The performance of these two algorithms is compared in Table 4. As can be seen from the table, the mean project turn-around time 34 is reduced around 8% by 2TFB-SF in comparison with the 2TSB’s performance.
Table 3: A sequence of four submitted projects and their parameters
qr 5 i h p
Table 4: Performance of an example run for 2TSB and 2TFB-SF
qr 5 i h p
21
Figure 6: An example run for 2TSB and 2TFB
(a) Schedules for the first four jobs are the same for both the algorithms. Job3,2 is moved to the head of the waiting queue by Two-tier Flexible Backfilling algorithm.
(b) Schedules for four projects
(c) Time slot 6 is a potential backfilling time slot for job4,1’s execution.
Unfortunately, doing so will delay job2,1 beyond its latest starting time.
22 4.6. Two-tier Backfilling Implementation
In order to implement the proposed algorithms, it is important to organize the information of resource availability and reservations in a data structure which can provide efficient operations for searching, adding, deleting, and updating. In this section, we introduce a data structure for advanced resource reservation which is used to implement our proposed scheduling algorithms. The data structure is based on the linked-list data structure [29] because of its simplicity and flexibility.
The description of our proposed data structure is illustrated in Figure 7, while Table 5 shows the implemented operations on the data structure. The data structure is a linked-list-based data structure. Each node in the list is defined as a &Y. ( E !Y , ! [ ! W, 2 2 Y&) , where E !Y denotes a time moment at which changes in reservations or resource availability occur, ! [ ! W denotes the number of available resources from the node to the next node, and 2 2 Y& is a linked-list of resource reservation records at E !Y . Each record is a 5-tuple information consisting of project index , job index , job service time , , the latest starting time of job ! , and the resource requirement " , .
Figure 7: Data structure used for resource reservation
23
Table 5: Data structure operations
Operations Explanation
2! = 2 &D3 E } , ~ Search the earliest possible starting time for , [! Z %\ !! &D3 E } , ~ Find all the feasible backfilling times for , . .. 2 Y&( , ) Add a resource reservation for , at , . ! 2 Y&( , ) Delete the existing reservation of waiting job ,
ℎ 2 Y& ( , , C) Check the resource availability and delay some reservations to accommodate , if necessary, given that the number of delayed projects ≤ C.
24
Chapter 5: Experimental Evaluation
This chapter presents the experimental evaluation where the effectiveness of proposed algorithms is verified. We first introduce the simulation methodology and then present the experimental results and the analysis.
5.1. Simulation Methodology
In order to evaluate the proposed scheduling algorithms, we have developed a simulator for our scheduling problem. The simulator is implemented based on CSIM 20 for Java [10] which is a simulation package with a process-oriented discrete-event scheduling model. CSIM 20 has been widely used to simulate complex systems in academia as well as industry.
Table 6: Simulation parameters
Parameters Distribution Random function
parameters
Project Inter-arrival time Exponential can be adjusted
Number of jobs | | Normal | | = 5.0 ; € •‚= 2.0
Job Service time
, Exponential = 500.0
Resource requirement # , Exponential # , = 2.0
Resource
Types of resources Constant 5
Capacity of type- resources Uniform &ƒ„= 20 /ƒ„= 40
Table 6 summarizes our simulation parameters which are randomly generated according to some types of distributions. One should notice that the values of two parameters, | | and # , , are integer parts of the floating-point value generated by random functions. In our experiments, the inter-arrival time between any two successive project arrivals is exponentially distributed with an adjustable mean value in order to control project arrivals. By doing so, we can observe the performance of the proposed algorithms under different system loads. All the simulation results shown here are obtained by averaging the results of 5 simulation runs with different seeds for random number generation. Each simulation run is terminated upon the successful completion of 1000 projects.
The overall performance of the proposed scheduling algorithms could be evaluated by two major metrics: mean project turn-around time and average resource utilization. The former is used to measure the performance from the customer’s point
25
of view, while the latter is the most common system-centric metric. However, it is shown that the average resource utilization does not change notably under different scheduling policies. Thus in the result analysis section, we do not present the results in terms of this metric.
5.2. Result Analysis
Job Scheduling vs. Project Scheduling
In this experiment, we compare the performance of three algorithms: Two-tier Strict Backfilling (2TSB), Two-tier Flexible Backfilling with SF = 0 (2TFB), and Two-tier Flexible Backfilling with SF > 0 (2TFB-SF). For 2TFB-SF, we use the parameters => = 0.5 and C = ∞. 2TSB algorithm is used as the baseline for the performance comparison purposes. In addition to the mean project turn-around time 34, the mean job turn-around time 34 , which is defined as
34 = |6|5 ∑ |†5
‚|∑|†85‚| ( , − )
|6|85 (8)
is another performance metric to investigate.
(a) (b)
Figure 8: Job scheduling vs. Project scheduling
As shown in Figure 8(a), it is not surprising that the mean job turn-around time is reduced considerably by 2TFB-SF algorithm because the effectiveness of the concept of slack factor has already been confirmed in several existing one-tier scheduling works [20, 21, 22]. Furthermore, the reduction in the mean job turn-around time greatly depends on the system load. For example, for the case where the mean
26
project inter-arrival time is set to 10, the performance difference between 2TSB and 2TFB-SF is about 4000 time units, which means a 7.5% improvement in the mean job turn-around time. On the other hand, in the case of the mean project inter-arrival time being set to 160, the difference is merely 700 time units but an improvement of 15.5%.
Figure 8(a) also demonstrates that 2TSB and 2TFB have almost identical performance for all the values of the mean project inter-arrival time used in this experiment. This can be explained by the fact that the opportunities of carrying out the flexible backfilling mechanism are rare in 2TFB because of => = 0.
Figure 8(b) clearly indicates that the performance of 2TSB, 2TFB, and 2TFB-SF in terms of the mean project turn-around time is roughly the same. This observed phenomenon can be explained by Figure 9 where 2TFB and 2TFB-SF can reduce neither the mean project waiting time nor the mean project running time. Take 2TFB-SF for example; on the average, the mean project waiting time is decreased by 5% to 33% when the mean project inter-arrival time is changed from 10 to 160, but meanwhile, the mean project running time is increased by 0.7% to 7%. These results indicate that 2TFB-SF can decrease the waiting time of the first job of a project but lead to an increase in the waiting time of other remaining jobs of the project. Overall, adopting 2TFB-SF does not lead to a significant improvement on the mean project turn-around time.
(a) (b)
Figure 9: Mean project waiting time vs. Mean project running time
0
27
In order to understand more about the relationship between two metrics 34 and 34, we conduct another experiment whose results are shown in Figure 10. In this experiment, we measure and observe the performance of 2TFB-SF while varying the
=> parameter. The experiment results show that using larger slack factors improves the mean job turn-around time significantly. However, this causes a modest increase in the mean project turn-around time. Based on our observation, the decrease of
34 does not lead to the decrease of 34.
(a) (b)
Figure 10: The impact of slack factor SF on 2TFB-SF
To sum up, although 2TFB and 2TFB-SF can reduce the mean job turn-around time notably in comparison with 2TSB, the improvement on the mean project turn-around time is negligible. One can suggest that 2TSB might be a good choice for two-tier cloud scheduling since it achieves the same performance as 2TFB and 2TFB-SF in terms of the mean project turn-around time, but its complexity is comparatively light weight.
The Impact of Priority Scheduling
In order to test how well Two-tier Priority Backfilling algorithm (2TPB) schedules high-priority projects, we devise the following two scenarios for the experiments. In the first scenario, all the submitted projects are scheduled by 2TSB.
28
influenced by the slack factor => and the preemption limit C , we conduct the following two experiments to observe the effect of these parameters.
Figure 11: Improvement on the mean project turn-around time by 2TPB algorithm with differential values of slack factor SF
In the first experiment, we study the performance of 2TPB with C = ∞ by observing the mean project inter-arrival time and slack factor =>. Figure 11 shows the improvement on the mean turn-around time of all the high-priority and low-priority projects in comparison with the 2TSB’s performance. As expected, 2TPB decreases the mean turn-around time of the high-priority projects by 6% to 27% but increases the turn-around time of others by 1% to 26% when the value of ( , =>) is increased from (10,0.2) to (160,1.0). Surprisingly, the mean turn-around time of all project inter-arrival time and preemption limit C. The results are similar to those in the first experiment. The mean turn-around time of all the projects stays stable except for the case with the lowest loading of project arrivals. As the value of ( , C) increases from (10,1) to (160,∞), the mean turn-around of the high-priority projects is
-30.00%
Improvement of project turn-around time [%]
Project inter-arrival time, Slack factor Improvement of entire project turn-around time
Improvement of High-priority project turn-around time Improvement of Low-priority project turn-around time
29
reduced remarkably from 1.5% to 20%. On the other hand, this also leads to an increase around 0.01% to 10.7% in the mean turn-around time of the low-priority projects.
Figure 12: Improvement of project turn-around time of 2TPB with differential limits on the number of allowable delayed projects PL
The above experimental results indicate that 2TPB works well with priority scheduling where some projects are preferred over the others. Besides, 2TPB does not lead to a general degradation in system service in most cases. Since two system parameters => and C have a strong impact on the mean turn-around time of both types of projects, the system behavior can be controlled by adjusting these parameters.
-15.00%
-10.00%
-5.00%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
10,1 20,5 40,10 80,20 160,infinte
Improvement of project turn-around time [%]
Project inter-arrival time, Preemption limit Improvement of entire project turn-around time
Improvement of High-priority project turn-around time Improvement of Low-priority project turn-around time
30
Chapter 6: Conclusions and Future Work
In this work, we study a two-tier scheduling problem which is present in the cloud computing environments. This scheduling problem differs from the traditional one-tier scheduling problems since a submitted project consists of multiple jobs each requiring several resources for its processing. In order to reduce cloud service’s turn-around time and support priority scheduling, we have developed a set of scheduling algorithms of different attributes. All the algorithms are derived from conservative backfilling algorithm, but enhanced with the concept of project’s slack which is calculated by multiplying project turn-around time with a system parameter slack factor. Besides, another system parameter preemption limit is also proposed to control the behavior of the cloud scheduler.
The algorithms developed in this study have been experimentally evaluated under different system loads by computer simulation. The experimental results indicate that Two-tier Flexible Backfilling with => > 0 (2TFB-SF) can reduce the job turn-around time by 7.5% to 15.5% and achieve almost the same performance in terms of the mean project turn-around time metric as Two-tier Strict Backfilling (2TSB) when the mean project inter-arrival time is changed from 10.0 to 160.0. Based on these results, we also reach an interesting conclusion that the decrease in the mean job around time does not always lead to a decrease in the mean project turn-around time.
The experimental results also indicate that Two-tier Priority Backfilling (2TPB) can efficiently reduce the mean turn-around time of high-priority projects, but does not lead to an increase in the mean turn-around of all the projects in most cases.
Furthermore, the behavior of the algorithm can be easily controlled by tuning two system parameters: slack factor SF and preemption limit PL, whose impact is analyzed in this work as well. More specifically, the mean turn-around time of high-priority projects is decreased from 6% to 27% when the value of ( ,=>) is increased from (10,0.2) to (160,1.0). As the value of ( , C) is relaxed from (10,1) to (160,∞), the mean turn-around time of high-priority projects is reduced from 1.5% to 20%.
Our proposed algorithms satisfy one fundamental requirement of the two-tier scheduling problem that a project should be granted a guaranteed departure time at the project’s arrival time. However, doing so might decrease the jobs’ backfilling opportunities since we must reserve resources for all the waiting jobs that have been
31
granted. Hence, the project turn-around time cannot be reduced considerably by our proposed algorithms. For future work, we plan to consider a less conservative backfilling approach in which only the jobs belonging to the first project in the waiting queue can receive resource reservations. Obviously, this less conservative approach will degrade the predictability of two-tier backfilling algorithms, but it may bring a more considerable reduction in cloud service’s turn-around time.
Furthermore, an optimal algorithm for the off-line version of the scheduling problem, in which all projects’ characteristics are known beforehand, will be studied in our future work. Other scheduling objectives, i.e. project success ratio, cloud provider revenue, are considered as well.
32
References
[1] L. Ai, M. Tang, C. Fidge, “Resource allocation and scheduling of multiple composite web services in cloud computing using Cooperative Coevolution,” in International Conference on Neural Information Processing (ICONIP 2011), Shanghai, China, pp.13-17, Sep. 2011.
[2] “Google Drive”, available at: https://drive.google.com
[3] A.P.A. Vestjens, “On-line Machine Scheduling,” Ph.D. Thesis, Department of Mathematics and Computing Science, Technische Universiteit Eindhoven, Eindhoven, The Netherlands, 1997.
[4] A. Benoit, L. Marchal, J.-F. Pineau, Y. Robert, F. Vivien, "Scheduling Concurrent Bag-of-Tasks Applications on Heterogeneous Platforms," in IEEE Transactions on Computers, vol.59, no.2, pp.202–217, Feb. 2010.
[5] C. Anglano, M. Canonico, "Scheduling algorithms for multiple Bag-of-Task applications on Desktop Grids: A knowledge-free approach," in IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2008), pp.14–18, April 2008.
[6] K. Gopalan, T. Chiueh, “Multi-resource allocation and scheduling for periodic soft real-time applications,” in Proceedings of Multimedia Computing and Networking, pp.34–45, Jan. 2002.
[7] M. Holenderski, R.J. Bril, J. Lukkien, “Parallel-Task Scheduling on Multiple Resources,” in Proceedings of the 24th Euromicro Conference on Real-Time Systems (ECRTS '12), pp.233–244. 2012.
[8] A.W. Mu'alem, D.G. Feitelson, "Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling," in Parallel and Distributed Systems, IEEE Transactions on Parallel and Distributed Systems, vol.12, no.6, pp.529–543, Jun 2001.
[9] D.G. Feitelson, "Experimental analysis of the root causes of performance evaluation results: a backfilling case study," in IEEE Transactions on Parallel and Distributed Systems, vol.16, no.2, pp.175–182, Feb 2005.
[10] “CSIM 20”, available at: http://www.mesquite.com/products/csim20.htm
[11] D.D. Sleator, R.E. Tarjan, “Amortized efficiency of list update and paging rules,”
[11] D.D. Sleator, R.E. Tarjan, “Amortized efficiency of list update and paging rules,”