Related Work - 隨時就定位：一個因應大規模雲端服務之高效率服務請求分派機制

For serving a large amount of users from all over the world, web service providers usually have to adopt a cluster of distributed servers for sharing the incoming workload.

Therefore, load balancing has been an important issue on enhancing the performance of distributed servers [3][4][5][6][9][10][11][14][18]. Incoming requests should be evenly distributed among the servers to achieve quick service response. Load balancing approaches can be separated into two categories: static load balancing and dynamic load balancing. The former has priori information about the requests to process before the system starts, so it can arrange the resources and requests in advance [10]. The latter dynamically makes the decisions of request dispatching during the operation of the system [9]. For online web services, since the incoming requests cannot be predicted in advance, dynamic load balancing mechanisms are usually the choice to optimize the system performance.

Load balancing for large-scale services has been studied by many researchers and usually can be classified into two major types: redirection-based and dispatching-based approaches, respectively. The redirection-based approaches are usually adopted in a system where the computing resources belong to different administration domains or even are of different ownerships. On the other hand, the dispatching-based approaches blend well with the systems of single administration domain, ownership, and even application. In the redirection-based approaches, each server can not only execute services, but also redirect requests to other servers when it is overloaded [6]. On the other hand, in the dispatching-based approaches [31][32] there are two different components: dispatchers and servers. A dispatcher is responsible for distributing the incoming requests evenly to the pool of servers and the servers do the actual processing of the incoming requests.

Some kind of redirection-based approaches is usually applied in the classrooms at school, where each student has a desktop computer in front of him and all the computers in the classroom are interconnected through a local area network. Under normal circumstances, each student uses his own desktop computer to do his work. However, sometimes when a student need to execute a number of computing jobs simultaneously, his desktop computer, equipped with the load-balancing capability, will automatically redirect some of the jobs to other computers in order to get all the jobs done sooner. , Nakai, Madeira and Buzato [6] used RPS(Requests Per Second) to estimate if a request needs to be redirected to other computing resources. Each resource can set the value of RPS for limiting the maximal load on it and redirect the incoming requests to other lightly loaded resources once achieving its maximal load level.

Some previous research [29][30] proposed hierarchical redirection-based load-balancing model, where request redirection occurs between different organizations. Each organization maintains its own server cluster which contains a controller and several servers. Each server cluster is not only set up with the services for their own needs, but also deployed with some applications for serving the requests from other organizations. Under such architecture, the controller is used to monitor the status of the servers within the cluster and assign requests to appropriate servers. Controllers of different organizations can communicate with each other to conduct load-balancing activities through request redirection. The work in [29][30] proposed approaches to taking both expected server processing time and network latencies into account in order to reduce services response time. If the system load of a specific server cluster is high and the controller decide to redirect some requests to other server clusters, it will estimate which server cluster can get the requests done with the shortest response time, based on the

model of expected server processing time and network latency, and then redirect the requests to that server cluster.

The dispatching-based approaches can be further divided into two categories with single dispatcher and multiple dispatchers, respectively. In the single-dispatcher load-balancing architecture, all the user requests pass through the single dispatcher and are assigned to suitable resources in the server pool by the dispatcher. The SQ(d) algorithm has been studied in depth for this architecture [31][32]. At each arrival of user request, the dispatcher samples d resources and gets the number of requests at each of them. The request is sent to the resource with the least number of requests.

For traditional small-scale web applications, the single-dispatcher load-balancing architecture is usually the choice because of its simplicity and effectiveness [2]. However, such centralized dispatching structure cannot handle the huge amount of requests commonly seen in large-scale cloud services. Therefore, distributed dispatching approaches have been developed recently, adopting a set of distributed dispatchers which work independently for distributing their portions of incoming requests evenly onto the servers [15].

In [1], Lu et al. proposed a Join-Idle-Queue (JIQ) algorithm for efficient distributed dispatching. The central idea of JIQ is to decouple discovery of lightly loaded servers from request assignment. In JIQ, servers will inform particular dispatchers when they become idle, without interfering with request dispatching. This removes the load balancing work from the critical path of request processing and avoids excessive communication overheads between dispatchers and servers, making JIQ scalable for large-scale cloud services involving thousands of servers or more. However, in the basic idea of JIQ, a server will join a dispatcher for receiving incoming requests only when it becomes idle. This mechanism becomes

ineffective when system load is high since no servers will be idle and the dispatchers have to dispatch requests randomly. In this thesis, we try to develop a Join-Queue-Anytime (JQA) mechanism in which each server is always registered with a particular dispatcher anytime, avoiding the situation of random dispatching in JIQ. Therefore, the JQA mechanism is expected to achieve better performance than JIQ under moderate or high system load.

Chapter 3. The Join-Queue-Anytime (JQA)

在文檔中隨時就定位：一個因應大規模雲端服務之高效率服務請求分派機制 (頁 12-16)