Given a Web server and different classes of clients, the goal is to provide service differentiation by HTTP request scheduling on the website gateway. All HTTP requests originated from clients will pass through the website gateway and the gateway will schedule them to the Web server according to the QoS policies and the service rate of the Web server.
The website gateway first classifies the incoming requests into different service classes by inspecting the content of IP headers, HTTP headers, and HTTP payloads. Then the website gateway decides which request should be fetched next from different service classes and when it should be released to the Web server according to the QoS policies and the service rate of the Web server, respectively. For knowing the characteristics of Web pages stored in the Web server, the website gateway probes the Web server before the on-line operation. In a word, request classification, request scheduling and server probing are the three elements the website gateway does for service differentiation. These issues are discussed more deeply below.
For concentrating on the design of the request scheduling, the dynamically generated Web page and the cluster of servers are not considered in this work. A dynamically generated page varies its URL and response size. This makes the gateway hard to seize the characteristics of dynamic pages. In multiple-server scenarios, the issues of the server load balancing also need to be considered. Therefore, we only focus on a single Web server with static Web pages in this work.
2.1 Request Classification
A common classification paradigm is to inspect the IP 5-tuples (source IP address, destination IP address, source port number, destination port number, and protocol type) of a
packet header. However, this type of classification is content-blind, that is, the classifier cannot see the information contained in the application layer protocols. The website operator may wish to define more flexible QoS policies based on the application layer protocols such as HTTP for service differentiation. Therefore, the classifier should be content-aware, i.e. see the information contained in the application layer protocols. Below is some usable information for classifying Web requests based on HTTP.
1. URL (Uniform Resource Locator): An URL usually contains host name, port number, username/password, URL path, and file extension.
2. HTTP header: Some useful fields are “Authorization”, “Proxy-Authorization”, and
“User-Agent”.
3. HTTP payload: HTTP payload may contain session level identifiers. Cookie and SSL ID (Secure Sockets Layer Identifier) are two examples.
In our work, a content-aware classification mechanism on the website gateway is presented to provide flexible QoS policies for website operators.
2.2 Request Scheduling
After HTTP requests have been classified and accumulated in the corresponding queues, the request scheduler decides which request should be fetched next and when it should be released to the Web server. For service differentiation, each class queue is assigned a weight and the server resource is proportionally partitioned according to the weights. The larger weight a class has, the more server resource the requests in that class can utilize. In our work, the server resource is partitioned based on throughput because it explicitly stands for the output rate of an HTTP response. Therefore, the request scheduler schedules requests for partitioning the throughput of the server.
Several packet scheduling disciplines, such as PQ (Priority Queuing) [13], WFQ
(Weighted Fair Queuing) [14], WRR (Weighted Round Robin) [15], and DRR [12], can be used for determining which queue to be fetched next. We intend to modify these packet scheduling algorithms for request scheduling. However, unlike traditional packet scheduling algorithms which are mostly work-conservative, our algorithm needs to be non-work-conservative for access link because the service time of a request depends on the size of its response, not the size of the request itself. With PQ, requests are scheduled from the head of a given queue only if all queues with higher priorities are empty. Within each priority queue, requests are scheduled in the FIFO (First In First Out) order. However, if the volume of higher-priority requests becomes excessive, lower-priority requests will be dropped as the buffer allocated to lower-priority queues starts to overflow. With WFQ, WRR, and DRR, each class queue will be assigned a percentage of the server resource according to its weight.
However, WFQ implements an O(log N), where N is the number of queues, algorithm that requires selecting the smallest timestamp among all queues. The O(1) WRR provides the correct percentage of the server resource to each class only if requests in the queues have the same response size or the mean of response size is known in advance. Therefore, the O(1) DRR is finally employed as our scheduling algorithm since it is much simpler and it can overcome the limitations of PQ, WFQ, and WRR.
Once the scheduler fetches a request, the next step is to decide its releasing time to the Web server. This is an extra step in request scheduling, compared to packet scheduling. If the releasing rate is too high, the incoming requests will be sent to the server too quick, causing the queues at the gateway to be empty and the requests to be queued on the server and may even overwhelm the server. Note the server itself does not provide any differentiation in processing the requests. Thus, the scheduler should release requests according to the service rate of the Web server. In our work, the window control mechanism is employed to throttle the rate of releasing requests. Combining the concepts of DRR and the window control, the scheduler can partition the throughput of the server and release requests according to the
service rate of the server.
2.3 Server Probing
In order to let the scheduler work accurately to partition the throughput of the Web server, some characteristics of Web pages stored in the server should be known in advance. Without modifying the system kernel or the daemon program in the Web server, URL and the response size of each Web page on the server are probed from the gateway before the on-line operation
or during the system initialization. The probed results are used by the scheduler for the initial accesses of the Web pages, and they will be updated repeatedly by the later accesses of the Web pages, should they get modified during the operation, because the gateway knows the latest response sizes when receiving the Web pages from the server.