2 Related Work
2.3 QuID
In this section, we briefly introduce QuID in [4].
Given parameters as follows:
z N, current number of servers
z X, the number of task completions in the previous measurement interval
z A, the number of task arrivals in the previous measurement interval
z U, U =(U1+U2+U3+ +... UN) /N, where U is the utilization rate of resource i i
z α, the target utilization rate
In QuID, the required number of resources for the next interval, N’, is defined based on the following equations.
/
D U X= (2.1) ' max( , ) *
U = A X D (2.2)
10
' * '/
N = ⎡⎢N U α⎤⎥ (2.3)
QuID is a utilization-targeted algorithm. It first computes the average utilization demand per completion with (2.1). Secondly, the normalized utilization for the next interval is obtained by (2.2). Finally, it computes the upper bound on the number of resources needed to achieve the target utilizationαby (2.3). IfN'> , QuID initiates N request to acquire (N'−N) resources, otherwise releasing (N−N') resources.
11
Chapter 3 A scalable computing framework for Interactive Workflow Applications on the Cloud
This chapter presents a scalable framework for interactive workflow applications on the cloud computing platform. The framework deals with the scenario that an interactive workflow application, hosted on a cloud computing platform, runs many workflow instances simultaneously according to the incoming user requests. Since the amount of incoming requests changes with time and the cloud platform is a pay-per-use service, the application has to dynamically manage the resources it uses to maintain acceptable response time and reduce the total cost of resource consumption under various workloads.
In the framework, each resource, representing a distinct computing server, is capable of processing multiple interactive workflow requests. Prior to ready for service, the required data and workflow definitions have to be deployed to the resources in some way, e.g. the Amazon machine image(AMI)of Amazon EC2 [22].
An AMI, to be dynamically deployed in the Amazon Elastic Compute Cloud, is a pre-built package of software containing applications, libraries, data and associated configuration settings.
To efficiently utilize resources, there are two key issues considered in the framework. The first is finding the least loaded resource for dispatching incoming requests. The second issue deals with dynamic resource provisioning(DRP)for adaptively handling dynamic workloads. With resource state monitoring, each workflow enactment request will be sent to the least loaded resource for service. Least
12
load dispatching algorithm [2] is more effective than algorithms without the feedback loop which are easier to implement using only the information in the client requests.
The idea of least load dispatching is a greedy approach that assumes the least loaded resource becomes idle first and thus produces the shortest request response. Therefore, the effectiveness of least load dispatching largely depends on how to accurately capture the computing load on each resource. To find the most effective load metric, several candidate load metrics are proposed and evaluated with simulation studies, which will be presented in chapter 4. Our strategy for dynamic resource provisioning is presented in chapter 5. The policy will be compared with the one in [4]. In the dynamic resource provisioning strategy, the most effective load metric evaluated for request dispatching is used to represent the resources’ load status. In the following, we first introduce the components of the framework and then describe how they cooperate with each other.
Figure 3.1. The diagram of the framework.
13
Fig.XXX shows an overview of the framework in handling user requests for an interactive workflow application running on a resource cloud. The architecture consists of four main components Dispatcher, Resource Allocator, Dynamic Resource Provisioning Manager(DRP), and Resource Manager. The major capabilities of each component are listed as bellow:
Dispatcher
z Receiving requests of workflow enactment.
z Fetching the ID of the least loaded resource from the Resource Allocator.
z Initiating an instance of the corresponding workflow definition on the least loaded resource according to the request.
z Dispatching each workflow enactment request to the least loaded resource.
Resource Allocator
z Retrieving the running resources list and their latest load information from the Resource Manager.
z Finding the least loaded resource according to the load information among running resources.
z Informing the Dispatcher the least loaded resource ID.
Dynamic Resource Provisioning Manager
z Retrieving the running resources list and their latest load information from the Resource Manager.
z Based on REM_DRP, determining Rnext , the number of running resources for the next time interval.
14
z Initiating a resource provisioning request to the Resource Manager whenRnext is not equal to the current number of running resources.
Resource Manager
z Monitoring the state of the resource pool: running resources, suspended resources, idle resources.
z Adding or releasing running resources when receiving a resource provisioning request.
When a user sends a workflow enactment request to the system, Dispatcher serves as the front-end guard of the entire system. According to the request, Dispatcher initializes and puts a workflow instance on the resource with the least load, returned by Resource Allocator. Consequently, as shown in Fig.3.2, when the user interacts with a workflow instance, corresponding events are triggered during user navigation, and designated tasks in the workflow are submitted to the workflow engine on the resource for execution. During interaction, the workflow instance responds the execution results to the user and stores the results into the database. We apply a session affinity based scheme, that all subsequent requests at a workflow are handled by the same application server. Tasks on a resource are executed in a non-preemptive FCFS(First-come, first-served)order.
15
Fig. 3.2. An example of multiple workflow instances running on a resource.
Fig. 3.3 depicts how Resource Manager interacts with the Resource Allocator and the DRP.
Fig. 3.3. Cooperation among the Resource Allocator, the Resource Manager, and the DRP Manager.
16
Resource Manager is an entity similar to the Grid Information Service(GIS)
which is responsible for grid resource registration and services discovery in a typical grid environment [34]. As shown in Fig.3.2, Resource Monitor on each resource is responsible for monitoring and recording the resource’s load status. Resource Manager combines all state monitors to monitor the state of each resource, and stores load metrics such as tasks waiting queue length, response time, arrival rate, task queue waiting time, resource utilization, etc. Based on the load information, Resource Allocator can identify the least loaded resource and passes the ID of that resource to Dispatcher. Additionally, the Resource Manager also complies with the decision made by the DRP. For each schedule interval, the DRP fetches the load information from the Resource Manager, diagnoses the state of running resources, and initiates requests to the Resource Manager to acquire or release resources whenever needed.
In the resource pool, all the resources are classified into three groups according to their status: running, suspended, idling. In the running group, the resources can accept new workflow enactment requests. When making a dispatching decision, Resource Allocator considers the load information of running resources only. The number of running resources is determined by the DRP. If DRP asks Resource Manager to release resources, Resource Manager shifts designated running resources into the suspended group in a last-in-first-out order. Resources in the suspended group take no more workflow enactment requests while serving the existing workflow instances. They are shifted into the idle group after all its workflow instances are done.
On the contrary, when more resources are needed, Resource Manager can select resources from either the suspended group or idle group. The difference between the suspended resources and the idle ones is that it needs a preparing time for the latter to
17
get ready as running resources, while the suspended resources needn't. A resource preparing time is typically the time to initialize the workflow engine or for booting the operating system [13]. Therefore, Resource Manager first reinstates resources from the suspended group since they can be used immediately. If no resources are available in the suspended group, Resource Manager has to wait the selected idle resources for a warm-up time period.
18
Chapter 4 Least Load Dispatching for Workload Balancing
For interactive workflows, which are stateful session-based applications, the observed response time is the major concern to the served clients. To ensure that stable and acceptable response times are continuously met, the policy must be fair enough, e.g., the workload is load-balanced among the available resources. However, resource allocation for interactive workflows is different from that for scientific workflows which can be applied with a complex static mapping. Moreover, the load metric of resources is hard to evaluate. So, to comprehensively explore the workload of each resource upon dispatching, several candidate load metrics are proposed and compared. Our approach for workload balancing is based on the least loaded policy which is state-aware and assumes to use no dedicated resources. The last condition indicates that our approach is resource-blind and suitable in a heterogeneous environment.
4.1 Load Metrics
Fig. 4.1 Workload parameters upon resource execution.
19
Fig. 4.1 shows the workload parameters of a resource when users’ workflows are executed concurrently. Each workload parameter is defined as follows:
1)Arrival rate
The arrival rate of a resource is the number of new tasks arriving at the resource’s waiting queue within a time-interval. If the arrival rate increases, the queue length increases based on queueing theory so does the response time of a task.
2)Average response time
Let rij be the response time that the task j of workflow i executed on resource R within a time-interval monitored. Then the average response time of R can be obtained by average r
( )
ij , where rij is the sum of (1) transmission time of the request to the resource, (2) waiting time for the instantiated task in the waiting queue of R, (3) processing time of the task at resource R, and (4) transmission time of the result back to the user.3)Remaining tasks
A task in an interactive workflow is executed only when all its preceding tasks for the input condition complete and the instantiation event is triggered. The number of remaining tasks of a resource can be defined as all unexecuted tasks for workflows served. Since the information of task number of a workflow can be retrieved from a workflow repository, we might use the number of remaining tasks to predict the future workload.
20
4)WF counts
The number of workflows served at a resource.
5)Utilization rate
The utilization rate of a resource can be defined as the percentage of the time spent for tasks execution within a time-interval. Unlike the above metrics, utilization rate has a limited applicable range. It is an effective load metric only when its value is below 100%. On heavily loaded systems, the utilization rate is at most 100%, therefore it can not effectively distinguish different load levels further.
Utilization rate is thus excluded in the following experimental comparisons.
4.2 Simulation environment
The details of GridSim toolkit are introduced in Chapter 2. In the following we briefly introduce the entities for the simulations in this thesis and describe how they operate. The major entities in our simulation environment are listed in Table. 4.1. The parameters for configuring a simulation case, as listed in Table 4.2, are set in the SimulationMain class. When the simulation environment extended from GridSim package is initiated, all entities and their corresponding Input and Output entities are instantiated. Whenever an entity is instantiated, it waits for events and takes appropriate actions according to the event type.
The Requester entity issues events to Dispatcher entity to simulate incoming workflow enactment requests. The request arrivals are modeled as a Poisson process
21
with rate X, which is a configurable parameter in our simulation environment. When Dispatcher entity receives the event, it retrieves the ID of the least leaded resource from ResourceAllocator entity and dispatches a UserEntity to the resource for running a randomly generated workflow. Each UserEntity corresponds to a workflow execution. A UserEntity suspends itself to simulate a user thinking time between contiguous tasks in the workflow. A UserEntity submits a task to the resource only when all its preceding tasks have finished execution. The memory used by a UserEntity is released when all tasks in the workflow are completed.
For each resource, ResourceManager entity specifies a corresponding ResourceMonitor entity to record the execution information for each task, such as submission time, waiting time, execution start time, finish time, etc.
ResourceManager entity maintains three resource lists: running list, suspended list, idling list. Each list is a linked list containing resource ID’s. ResourceManager entity manipulates the lists according to the provisioning requests from DRP_Manager entity.
ResourceAllocator entity retrieves load information about each resource from ResourceManager entity to find the least loaded resource for request dispatching.
DRP_Manager, with the help of ResourceManager, periodically diagnoses the total load of all running resources for decision making on dynamic resource adjustment.
The simulation continues until no more events are generated and all generated events have been processed. After the simulation finishes, the simulation report is stored in an excel file for performance investigation.
22
Entity class Functionality
SimulationMain
GridSim initialization, entities creation, simulation parameters
Requester Modeling request arrivals as a Poisson process
Dispatcher Dispatching requests
UerEntity Each user entity running a workflow
ResourceAllocator Resource Allocator
Resource resources
ResourceManager Resource manager
ResourceMonitor Resource monitor
DRP_Manager DRP Manager
ResStamp TimeStamp object ; attributes : (clock, running resources) ResTimeStamp TimeStamp object ; attributes : (exeStartTime, exeEndTime)
Table 4.1 Simulation entities
23
Parameter Description
Resource_number Resources number for the entire simulation Workflow_number Workflows number for the entire simulation
rounds How many rounds for dispatching
(each with different request arrivals) Request_arrival_interval Smaller value means faster arrival rate Measurement_interval_resTime
Measurement_interval_arriRate Measurement_interval_utilization
Measurement interval of response time, arrival rate, and utilization rate.
Run_initial Number of initial running resources
DRP_interval Measurement interval of REM_DRP and QuID
Utilization_rate Target utilization rate of QuID
Workload_limit Workload limit of REM_DRP
Dispatching_metric Load metric for dispatching
Warm_up_time Warm‐up time for idling resources
Table 4.2 Simulation parameters.
24
Workflow tasks are modeled as Gridlet objects and executed on GridSim resources. A Gridlet object contains all the information related to a task and the execution details such as the task length, the size of input and output files, the task originator, etc. Task execution time and user thinking time are generated from the negative exponential distributions with the mean values of 3 seconds and 7 seconds respectively based on the TPC benchmark [23]. The Transaction Processing Performance Council (TPC) defines transaction processing and database benchmarks and delivers trusted results to the industry. TPC-W is a benchmark for Web applications. The task length is expressed in terms of the time it takes to run on a standard resource PE (Processing Element) with a MIPS rating of 100. Therefore, in our simulation environment, the processing capability of a resource is expressed in MIPS (Millions of Instructions Per Second). The workflow model in our simulations is summarized in Table 4.3.
Workflow tasks 4 ~ 15 (random generation)
Task execution time 3 sec. (negative exponential distribution) User thinking time 7 sec. (negative exponential distribution) Maximum degree of a task 3
Input file size 100 bytes Output file size 100bytes
Table 4.3 Workflow model.
4.3 Simulation Setup
Based on the workflow model in section 4.3, we set up a simulation and evaluate
25
the results. The parameters used to configure the simulation environment are listed in Table 4.4. We model request arrivals at the resource cluster with a Poisson distribution of rate 2.2 and 2.0 in the homogeneous and the heterogeneous environments respectively. The task execution time and user thinking time are generated according to the TPC-W benchmark for Web workloads. In all our simulations, the measurement interval for obtaining the request arrivals and the average response time are listed in Table 4.4. In the heterogeneous environment, there are six 100-MIPS resources, six 200-MIPS resources, and four 400-MIPS resources. The resource cluster is inter-connected by 100Mbps. The measurement intervals of arrival rate and response time are decided by running a series of simulations of 900 workflows using various arrival rate and response time measurement intervals. The values in Table 4.4 deliver the shortest average response time in the simulations. We also include random selection and the Round-Robin load balancing algorithm in the experiments for performance comparison.
Table 4.4 Simulation setup in homogeneous and heterogeneous environment.
26
To evaluate the effectiveness of each load metric, we define two performance metrics: the degree of load balancing and stability. Both of them are mainly calculated based on the task response times through the entire simulation.
Let
z avg() be the mean of a set of numbers.
z dev() be the standard deviation of a set of numbers.
z Rij denote the average response time of resource i between time j−5and j .
(The value of the average response time is calculated every 5 second through the entire simulation.)
z dj =dev R( ij), for all running resources at time j.
dj denotes the degree of load balancing among resources at time j.
Thus, the degree of load balancing and stability of the entire simulation can be obtained by the following equations. Since they are mainly calculated based on the standard derivation of response times, smaller value indicates better performance.
The degree of load balancing = avg(dj | j = 5, 10, 15 …end of simulation).
The degree of stability = dev (dj | j = 5, 10, 15… end of simulation).
27
4.4 Performance Evaluation
Homogeneous environment
Figure 4.2 Average response time for each load metric.
Figure 4.3 Stability and degree of load balancing for each load metric.
Heterogen
Figure 4
neous envir
Figure 4.4
4.5 Stabili
ronment
Average r
ity and degr
28
response tim
ree of load b
me for each
balancing fo
load metric
for each load c.
d metric.
29
Simulation results, depicted in Fig. 4.2~4.5, show that Round-Robin does not perform well, especially in the heterogeneous environment. The following discusses the performance of other metrics.
Arrival rate and response time are the most frequently used load metrics in web applications. The former performs better than the latter in the homogeneous environment while the result is contrary in the heterogeneous environment. The two metrics are based on information collected in the past. Sometimes, past information cannot accurately predict the future workload. Moreover, as shown in Figure 4.6, these time-interval based metrics have a potential problem that it’s hard to find a perfect interval for collecting an appropriate amount of load information. Once an interval is decided, any load variation outside the interval is ignored.
Obviously, the remaining tasks metric outperforms the others and is a good indicator of possible further workload for interactive workflow applications in both homogeneous and heterogeneous environments. The remaining tasks metric not only gives a more detailed load information than the WF counts do but also seizes the counteraction between request arrivals and completed tasks. Our simulation can conclude that a desirable basis for determining load might be the number of unfinished requests on an interactive workflow application.
30
Figure 4.6 Different scales of measurement interval for the response time metric.
31
For least load dispatching, all arriving requests will be sent to the same least loaded resource between two workload updates. This may overload the resource and lead to poor performance. To alleviate the potential problem, we modify the least load dispatching as follows. Upon request dispatching, if the least loaded resource is the same as the one in the preceding dispatching, the system dispatches the request to the secondly least loaded resource instead. Figure 4.7 shows that the modified least load dispatching outperforms the original one.
Fig. 4.7 Performance gained from the policy “don’t always dispatch requests to the least loaded resource.”
32
Chapter 5 Dynamic Resource Provisioning
In traditional server hosting environments, each application is equipped with a
In traditional server hosting environments, each application is equipped with a