Token-based load balancing Architecture - 針對異質多資源系統以代幣為基礎之負載平衡演算法

The overall load balancing architecture is modeled as a set of servers interconnected with high-bandwidth links. Without loss of generality, we assume each server is configured to handle one application; that is, it serves as the entry point for all user requests related to that application. In practice, a server may be in charge of multiple applications, or multiple servers may share the load for the same application. An application distinguishes itself from other applications by the types of requests associated with it; each request type is characterized by its specific arrival rate and multi-resource requirements. In our simulation model that will be described in more details later, for each server, we will assign the mean and variance for its request arrival rate and its requirement for each type of resources. In addition, each server contains multiple types of resources each with different capacity. Again, in the simulation model we will assign the capacity for each resource type in different servers based on certain random distribution.

A request can be executed on a server only when the available resources of the server can satisfy all of the request’s resource requirements; otherwise the request is put in a queue waiting for execution. Note that once the request is put in the wait queue, it cannot be rescheduled to another server. When a request is in execution, it claims all the resources from the server and will not return them back until it is finished. Therefore, the load of a server at a given time is the sum of the resources claimed by all the requests in execution. There are two sources of requests to each server: one is generated from outside by the user; the other is from the other servers in the cluster. This implies that a server may also be able to process requests from other applications. For simplicity, we assume each server is capable of executing any kind of request if it is asked to. Note that this assumption is not far from reality because, to be

scalable, modern cloud computing environment such as Amazon E3 or Google App Engine indeed try to replicate the same application to different servers based on the application traffic.

Figure 3.1 depicts the system architecture described above. In the figure, Client 1 represents an application with high CPU requirement, but the CPU resource of Server 1 is overloaded, while Server 2 and 3 only have moderate CPU load. From the perspective of Server 1, it is desirable to shift some of the requests from Client 1 to Server 2 or 3 properly to reduce its CPU load, so as to increase its own chance to accept more requests from its wait queue. Of course, since all of the servers have the same goal, the load balancing mechanism needs to ensure that workload is divided and assigned among the servers properly such that the overall resource utilization is kept high.

The load balancing algorithm is implemented in a distributed manner. Each server contains a workload mapper which monitors and analyzes incoming workload continuously, and communicates with other servers’ mappers to decide how workload is divided among them. In other words, it is the set of workload mappers that together implement the load balancing strategy for the whole system. The dispatcher in each server is the one that takes care of actual request dispatching.

Figure 3.1: Workload distribution model

In our token-based approach, each workload mapper divides the workload corresponding to each request type into multiple tokens, where the number tokens is determined in a way that is proportional to the size of the workload. Specifically, assume request type i has arrival rate Ri and average CPU time Ci, its workload can be derived as

i i

R C

L 

We simply pick a system-wise constant Lg as the “unit” of workload all tokens should represent. Therefore, the token size Ni for request type i can be derived using the following formula:

Note that in this thesis we only study the case where all token represent roughly the same CPU workload. Other variations are also worth further investigation. For example, we can allow tokens to represent different workload sizes and see the impact on the scheduling result.

Initially, each server holds all the tokens created by its mapper. Tokens serve many purposes. Firstly, because tokens can be passed among servers, if a server holds a token, it is responsible for the associated workload, meaning it should accept and process the corresponding requests dispatched from the token originator. Secondly, the workload mapper also uses the tokens to determine how to process each incoming request. For example, if a request type is divided evenly into N tokens, the workload mapper first picks one token from the N tokens randomly (or in a round-robin manner), checks which server currently holds the token, and dispatches the request to that server. Thirdly, because the total workload of a server can be computed by summing up all the tokens the server holds, the load balancing algorithm can use this information to rearrange tokens among servers to increase system utilization.

Figure 3.2 demonstrates how tokens are managed within each server, where T_mgrstands for the token manager that controls the creation and management of tokens for the server. In this example, the server’s own workload for request type i is divided into 6 tokens. The buddy set, which is used to hold tokens from other servers, is empty initially.

Figure 3.3 shows how tokens migrate between servers. In the figure, server A attempts to shift one of its tokens to server B. The token manager of server A marks the holder of a token to be server B, meaning that future requests of type i will have 1/6 chance to be redirected to server B. The token manager then notifies server B about the token assignment; Server B needs to add a new token in its buddy set to record the newly introduced workload it is responsible of.

? ? ? ? ? ?

? ? ? ? ? ? A A A A A A

Tmgr

Token set

Token spare set

Figure 3.2: Initial token set and buddy set for workloadi on server A

Workloadi Inter-cluster

workload

1/6 1/61/6 1/61/6 1/6

Token set

Tmgra

Token spare set

Figure 3.3: Token distribution scheme B A A A A A

Workloadi

C ? ? ?

? ?

? ? ? ? ? ? Inter-cluster workload

C C C D

? D

A ?

? ? ? ? ? ? B B B B

Tmgrb

1. Mark Holder_id=B 2. Lease token (1/6λi)

accept Server A’s token Workloadj

Chapter 4 Token-based Load Balancing based on Market

在文檔中針對異質多資源系統以代幣為基礎之負載平衡演算法 (頁 20-25)