Summary of Related Work - Related Work - 以任務分配解決即時金融服務中突發流量及網路不穩定問題

CHAPTER 2 Related Work

2.5 Summary of Related Work

It is not easy to balance the load of each server considering the bursty traffic and network instability. Constrained by the unpredictable mobile network bandwidth, SJF, Min-Min, Max-Min, and algorithms that require the completion time to function do not correspond to what this research expects to get.

Because of users’ mobility, transmission speed of mobile network might be fast or slow. Therefore, transmission time for each job would not be known in advance. On the other hand, the loading of each connection is unknown as well because there is no evidence to show where the heavy user is. When using Shortest-Queue-First algorithm, it may produce a shortest queue that has the longest waiting time. When it comes to Lest-Connection-Fist algorithm, it may balance the connection of each host but not balance the queue length or the response time, not to mention the RR algorithm and the Random algorithm which do not consider the backend state in the bursty traffic and network instability condition.

With the goal of achieving the best performance in such an uncertain network environment, this research analyzes the architecture and behavior of the system in the conditions of real world and then proposes an algorithm as a solution to solve these problems.

CHAPTER 3 Mobile Banking Messaging as a Service Framework

3.3 Mobile Banking Messaging as a Service Framework (MBMaaS)

This research aims at building a framework which can easily fulfill functions, including Instant Messaging, Instant Computing and Dissemination of Major Information, Bilateral Communication, Bilateral Trading, and many other mobile financial services.

When it comes to performance, flexible expansion and complete management of resource are in the considerations as well.

While the frameworks mentioned above make it feasible to accomplish the horizontal expansion, instant state of background management system cannot be monitored. Thus, only the static policies like Round-robin and Random can be implemented, but they are unable to satisfy instant demand of financial service.

As shown in Figure 2, based on the mentioned framework, monitor component is added to each of Connection Manager in the server farm with a view of monitoring the state of server. Information being monitored includes the number of connection, the number of users, the number of job, arrival rate, service rate and so forth. Following the monitoring procedure is the return of information to controller component on Load Balancer where task assignment can make judgment accordingly.

Figure 2. Mobile Banking Messaging as a Service Framework

CHAPTER 4 Network Delay Autocorrelation Model

4.1 Hypothesis

In order to solve the network problems, it is necessary to understand the characteristics of real-world network delay so that we can propose an improved method and verify it.

Like Figure 3, we observe one of the receivers, whose time series of network delay is recorded. In spite of the continuous variations of network delay, some patterns can be observed. Drastic variations of network delay rarely happen in a short period of time according to the observation. That is, the correlation of network delay exists right before and after a short time period. For instance, if the network delay is “high” before the short time period, so is the network delay of current short time period. This sign continues even after the short time period, meaning a continuous congestion of network may last for a while.

Figure 3. Time Series of Real-world Network Delay

To compare the performance of system in the real network when adopting various task assignment policies and to validate correctness and feasibility of task assignment

policy in subsequent tests, a mobile network which consists of tens of millions of users in the experimental environment is necessary to meet the scenario of future Bank 3.0.

However, it’s hard to use thousands of real mobile devices and deploy them in different real network conditions to verify the scheduling policy. Moreover, controlled factors would not remain the same in every real experiment, representing the performance of algorithm with different parameters cannot be compared. Hence, a network model is expected to be proposed as a way of simulating the real network condition.

4.2 Autocorrelation

The time series of network delay is assumed to meet the characteristic of autocorrelation.

Autocorrelation in the analysis of time series reflects degree of relevance of how the same series takes numerical data in various time periods. When Autocorrelation Coefficient gets greater in time series, this represents there is a higher relevance and likeness between the historical data and the future one.

Based on the observation and assumption above, a network delay autocorrelation model is designed to produce a network with the practical characteristic of network delay. By doing so, the systematic performance of different algorithms of task assignment can be evaluated and compared in the same network environment.

4.3 Network Delay Autocorrelation Model

The following Table 1 lists some of the related parameters we may use:

Table 1. Symbols of Network Delay Autocorrelation Model

Symbols Definition

𝑚_𝑖^𝑛 Message i target to the user n, where (i, n =1, 2, 3…) 𝑡_𝑖^𝑛 Delivery time of 𝑚_𝑖^𝑛, where (i, n =1, 2, 3…)

𝜏_𝑖^𝑛 Interval time between 𝑚_𝑖^𝑛 and 𝑚_𝑖−1^𝑛

𝐷_𝑖^𝑛 Network delay of 𝑚_𝑖^𝑛, where (i, n =1, 2, 3…)

In our assumption, autocorrelation of network delay is higher before and after a short time period but lower in a longer time period. Hence, a formulation is designed to satisfy the above assumption. If we produce shorter sending time interval through network delay generator, the probability of creating a similar network delay for the current one and the previous one is expected. Thus, this condition is achieved by the formulation as follows:

𝐷_𝑖^𝑛 = weight× 𝐷_𝑖−1^𝑛 + (1-weight)× 𝐷_𝑛𝑒𝑤 (1) The Pareto Distribution with alpha=1.1 is set, which is used in the SURGE web workload generator to produce network delay with the initial number 𝐷_𝑛𝑒𝑤. We hope we can follow Pareto Distribution under the law of large numbers, and then time series of network delay meets the characteristic of autocorrelation. Thus, an interval of weight ranging from 0~1 is defined. Whenever the weight approaches 1, 𝐷_𝑖^𝑛 gets closer to 𝐷_𝑖−1^𝑛 , and whenever the weight approaches 0, 𝐷_𝑖^𝑛 gets closer to 𝐷_𝑛𝑒𝑤. In other words, to ensure the characteristic of autocorrelation exists in the network delay time series, probability of weight with a value closer to 1 is raised as sending time interval is shorter, and in contrast, probability of owing a weight near 0 is expected when sending time interval is longer. Following is the method of how the weight is generated:

Step 1：

Figure 4. Interval between the Sending of Two Messages

First of all, as Figure 4 shows, an interval between the sending of two messages is recorded. Suppose the message 𝑚_𝑖^𝑛 for 𝑢𝑠𝑒𝑟_𝑛 is sent at time of 𝑡_𝑖^𝑛 along with another message 𝑚_𝑖+1^𝑛 sending at time of 𝑡_𝑖+1^𝑛 , the interval time between the two messages is 𝜏_𝑖+1^𝑛 formulated as:

𝜏_𝑖+1^𝑛 = 𝑡_𝑖+1^𝑛 − 𝑡_𝑖^𝑛 (2) Since there is no message before the first message 𝑚₀^𝑛, 𝜏₀^𝑛 is preset as:

𝜏₀^𝑛 = 0 (3)

Step 2：

Once 𝜏_𝑖^𝑛 of the message 𝑚_𝑖^𝑛 is computed, value of 𝜏_𝑖^𝑛 and the correlation between 𝑚_𝑖^𝑛 and network delay are requisite to be defined. A parameter, most possible weight, is defined, too. If the most possible weight equals 1, network delay 𝐷_𝑖^𝑛 for 𝑚_𝑖^𝑛 is highly possible to be similar to network delay 𝐷_𝑖+1^𝑛 for 𝑚_𝑖+1^𝑛 from the perspective of probability; on the contrary, if most possible weight equals 0, the probability of finding a network delay 𝐷_𝑖^𝑛 for 𝑚_𝑖^𝑛 that is similar to network delay 𝐷_𝑖+1^𝑛 for 𝑚_𝑖+1^𝑛 is very low.

For network delay, probability of drastic variations in the condition of shorter time period is low through observation. Therefore, a function 𝑓(𝜏_𝑖^𝑛) with the characteristics

stated above is set to describe the most possible weight. A parabola opening downward then is used to denote the function of most possible weight:

𝑓(𝜏_𝑖^𝑛) =1+^(𝜏𝑖_−2×𝑎^𝑛)2 (4) where,

𝛼 = ^𝐶2₂ ⁽⁵⁾

The width of parabola is decided by 𝑎 while b equals 𝜏_𝑖^𝑛 once 𝑓(𝜏_𝑖^𝑛) = 0. In other words, as long as 𝜏_𝑖^𝑛 ≥ 𝑏, there is no correlation between 𝐷_𝑖+1^𝑛 and 𝐷_𝑖^𝑛, which is the previous network delay of 𝑚_𝑖+1^𝑛 .

In Figure 5, this method is applied to produce either large or small functions of network delay autocorrelation by deciding different c. The model is denoted as C100 when c equals 100 and as C200 when c equals 200, and so on.

Figure 5. Functions that Decide the Most Possible Weight Step 3：

After getting the value of most possible weight, the final value of weight is required to compute the network delay. A function that is capable of randomly generating values of weight and enabling the production of most possible weight with the highest probability is expected. Hence, two parabolas, g(x) and h(x), are defined.

While g(x) opens upward and h(x) opens downward, the peaks of two parabolas meet each other at most possible weight, shown as Figure 6.

Figure 6. The Function that Decide the Final Weight

In order to make decision on the final weight, value of x is generated along the x axis within a range from 0~1. When 𝑓(𝜏_𝑖^𝑛) > x, weight = g(x); when 𝑓(𝜏_𝑖^𝑛) < x, weight = h(x). According to this method, random values of most possible weight can correspond to g(x) and h(x), deciding the final weight, as Figure 7.

Figure 7. Using Most Possible Weight Functions to Generate the Corresponding Functions that Decide the Final Weight

4.4 Result of the Network Delay Autocorrelation Model

Through Network Delay Generator, different network delay time series of autocorrelation Coefficient are generated, as Figure 8~12.

Figure 8. Network Delay with c=300 When Average 𝞽=10

Figure 9. Network Delay with c=100 When Average 𝞽=10

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

1.448E+09 1.448E+09 1.448E+09 1.448E+09 1.448E+09 1.448E+09

Network Delay with c=300 When Average 𝞽=10

Figure 10. Network Delay with c=50 When Average 𝞽=10

Figure 11. Network Delay with c=10 When Average 𝞽=10

Figure 12. Network Delay with c=1 When Average 𝞽=10

4.4 Autocorrelation Coefficient

Whether Autocorrelation Coefficient of network delay time series from different models is the same as expected is calculated and evaluated. s is assumed as starting time while t as ending time. Sequence of network delay is denoted as 𝑋_𝑠,𝑡: 𝑥_𝑠,𝑥_𝑠+1, 𝑥_𝑠+2, 𝑥_𝑠+3… … , 𝑥_𝑡. 𝜇_𝑠,𝑡 and 𝜎_𝑠,𝑡 represent the mean value and the standard deviation of the sequence 𝑋_𝑠,𝑡 respectively. Thus, autocorrelation of k lags is denoted as follows:

𝑅(𝑘) =

^𝐸(𝑋^𝑘+1,𝑛^−𝜇_𝜎^𝑘+1,𝑛^)(𝑋^{1,𝑛−𝑘}^−𝜇^1,𝑛+𝑘⁾

𝑘+1,𝑛×𝜎_{1,𝑛−𝑘}

(6) Figure 13 shows 0~10 lags, Autocorrelation Coefficient of network delay time series is generated by six models where c = 100~ 600. With the same lag, it is found that c and Autocorrelation Coefficient are in positive direction. This evidence proves generation of different network delay time series of Autocorrelation Coefficient is feasible when we adopt different values of c in the network delay autocorrelation model.

The result is in accordance with the expectation. Next, a Network Delay Generator with the characteristics of the real network which produces network delay of autocorrelation is applied to verify the task assignment policy.

Figure 13. Autocorrelation Coefficient of Network Delay Time Series Which are Produced by Six Models Where c = 100~ 600.

CHAPTER 5 Research Method

The current research aims to improve the performance of system based on the real behavior and bottleneck of system. To test the hypothesis that the qualities of users’

networks have impacts on the performance of system, an experiment is designed. The result of experiment is used to create a task assignment algorithm capable of improving the performance of system.

5.1 Pre-study

Take Financial Instant Messaging as an example, two receivers using mobile devices are arranged and connected to Openfire Server by the same Connection Manager, shown in Figure 14.

Figure 14. Architecture of Pre-study

Table 2. The Experimental Parameter of the two Set

Receiver 1 Receiver 2

Set 1 9.72 Mbps (Wi-Fi) 9.72 Mbps (Wi-Fi)

Set 2 9.72 Mbps (Wi-Fi) 2.09 Mbps (3G)

Table 2 compares Set1 with Set2. In Set1, two mobile devices receive the message from Wi-Fi with downloading speed at 9.72 Mbps in average, while Set2 assigned two

patterns of downloading speed in average, 9.72 Mbps with Wi-Fi and 2.09 Mbps with 3G respectively. Next, average waiting time of the message staying in Connection Manager is measured after sending the message to Set1 and Set2 with the same speed.

In Figure 15, it is found that average waiting time of Set2 is much longer than Set1 because of the impact of network delay, as shown in Figure 16.

Figure 15. Experiment Result of Pre-Study

Figure 16. The Impact of Network Delay

According to the experiment, it can be inferred that TCP adopted by Connection Manager is stop-and-wait instead of go-back-N. The reason why go-back-N can be used in TCP is on account of a single connection. The message would not face congestion due to the network delay of the previous message when using go-back-N as TCP.

However, in practice ,Connection Manager with go-back-N as TCP in a chat room needs a thread of TCP socket at least for connection because Connection Manager

Module have to manage a portion of the client connections. When facing a huge amount of messages sent by tens of thousands of users, the go-back-N system performs ineffectively because usages of thread and overhead exceed the loading of system.

Based on this reason, stop-and-wait is applied to avoid the problem of overloading.

5.2 Method Applied in the Research

Take the extension of Openfire Instant Messaging System as an example of framework.

In this framework, to overcome the poor performance made by network instability and bursty traffic, a method is needed to estimate the waiting time of task in the system for more efficient task assignment.

When a user’s request enters the horizontal scalable system, there will be an immediate dispatch of the request to the corresponding server. It’s assumed the load balancer and the Openfire main server are in a site of the system. So there will be infinite queuing spaces, and the system runs without bottleneck. Since mobile devices’ network connectivity varies, a message being delivered to a user may not arrive instantly. The message has to be queued in the connection manager when sending to the receiver, which takes up some system resources. In other words, the message will “block” other queued message until a successful delivery.

Since the message which the system handles is in the format of words, it does not take much time in computing on CM. What makes the system inefficient is the network jam as the result of network delay. In view of this, computing time of message on CM is neglected in the designed scenario, instead, network delay is seen as service time, as Figure 17 shows.

Figure 17. Network Delay is Seen as Service Time

According to the previous Pre-study, it proves that the variety of network delay would affect the waiting time of messages in CM queue. Based on observation in Chapter 4, it is found there is a correlation between the network delay before and after a short period. Therefore, we can predict the value of future network delay through historical data of the short term. Queuing theory [36][37] is the theory behind what happens when we have lots of jobs, scarce resources, and subsequently long queues and delays[35]. Thus, we use Queuing Theory to model our system and to predict the system performance.

The arrival of users’ requests is assumed to follow a Poisson process with an arrival rate of λ, and the service time for each Connection Manager is generally distributed with service rate of μ. The queue in Connection Manager has infinite buffer.

Requests at each CM are processed on a first-come first-served (FCFS) basis. Instead of M/M/1 queuing model, M/G/1 queuing model is used to model the Connection Manager because of the highly variable network delay. Table 3 lists some of the related parameters for this study:

Table 3. Symbols of Queuing Model

Symbols Definition

λ_𝑘 Average arrival rate of 𝐶𝑀_𝑘 in the sliding window.

𝜇_𝑘 Average service rate of 𝐶𝑀_𝑘 in the sliding window.

𝑆̅_𝑘 Average service time of 𝐶𝑀_𝑘 in the sliding window. In our scenario, we regarded network delay as service time. Thus, the average service time is also average network delay in the sliding window.

𝑞_𝑛𝑜𝑤^𝑘 Number of messages in the 𝐶𝑀_𝑘 queue immediately.

𝑢𝑠𝑒𝑟_𝑛𝑜𝑤^𝑘 Number of users which establish connection to the 𝐶𝑀_𝑘. 𝑞_{𝑐𝑢𝑡_𝑖𝑛}^𝑘

𝑞_{𝑐𝑢𝑡_𝑖𝑛}^𝑘 = ^{𝑢𝑠𝑒𝑟𝑛𝑜𝑤}₂^𝑘

There is an interval time between the predict algorithm processing time and the actual arrival time. In the interval time of the same 𝐶𝑀_𝑘 queue, there are probability of 0.5 that the messages which other existing users sent would cut-in in front of the incoming message.

𝑇_{𝑤𝑎𝑖𝑡𝑖𝑛𝑔}^𝑘 The waiting time in the 𝐶𝑀_𝑘 queue.

𝑇_{𝑞𝑢𝑒𝑢𝑖𝑛𝑔}^𝑘 The queuing time in the 𝐶𝑀_𝑘 queue.

𝑊₀^𝑘 The average remaining service time for the customer (if any) found in service by a new arrival (work it out using the mean residual life formula). 𝑊₀^𝑘 is the remaining service time in 𝐶𝑀_𝑘.

𝑊_𝑛𝑒𝑤^𝑘 The predict waiting time for the customer by a new arrival.

The purpose of this study aims at minimizing mean waiting time of message. This means that by knowing queue length and number of connection is not enough because the message with the shortest queue length does not guarantee the shortest waiting time.

CM with the least number of connections does not represent the lightest loading. This study recognized that to achieve a better performance, message should always be inserted into the queue with the shortest predicted waiting time. Waiting time is composed of queuing time and remaining service time:

𝑇_{𝑤𝑎𝑖𝑡𝑖𝑛𝑔}^𝑘 = 𝑇_{𝑞𝑢𝑒𝑢𝑖𝑛𝑔}^𝑘 + 𝑊₀^𝑘 (7) where

𝑊₀^𝑘 ≜ ^λ^𝑘^×𝑆₂^̅̅̅̅̅^𝑘² (8)

5.2.1 See the Future Network Delay (SeeFuND) Task Assignment Algorithm

The characteristic of network delay autocorrelation makes it possible to use historical network delay to predict the future network delay which is seen as service time. Thus, we propose a task assignment algorithm that can see the future network delay from history, named SeeFuND.

To approximately calculate the average service time (network delay) in a near time period, considering the variation of network instability, a method called moving average is applied to compute service time. Suppose that the window size of sliding window is N, and average service time within historical information of the near N period is taken as mean service time 𝑆̅_𝑘. Thus, queuing time is:

𝑇_{𝑞𝑢𝑒𝑢𝑖𝑛𝑔}^𝑘 = 𝑞_𝑛𝑜𝑤^𝑘 × 𝑆̅_𝑘 (9) Consider that there is an interval time between the algorithm processing time and the actual arrival time. In the interval time of the same 𝐶𝑀_𝑘 queue, there is an average probability of 0.5 that the messages targeting to other existing users would cut-in in front of the incoming message, as shown in Figure 18.

Figure 18. Other Existing Users Would Cut-in in Front of the Incoming Message

Given this phenomenon, a correction term is added in equation. Therefore, the waiting time 𝑊_𝑛𝑒𝑤^𝑘 is predicted as follows:

𝑊_𝑛𝑒𝑤^𝑘 = (𝑞_𝑛𝑜𝑤^𝑘 + ^𝑞_{𝑐𝑢𝑡_𝑖𝑛}^𝑘 ) ∗ 𝑆̅_𝑘 + 𝑊₀^𝑘 (10) where

𝑞_{𝑐𝑢𝑡_𝑖𝑛}^𝑘 = ^{𝑢𝑠𝑒𝑟𝑛𝑜𝑤}₂^𝑘 (11) SeeFuND task assignment algorithm calculate the expected waiting time 𝑊_𝑛𝑒𝑤^𝑘 of each CM queue, then the queue with the shortest expected waiting time is selected to connect to incoming user.

5.2.2 Incubation Period Phenomenon

However, there is still a problem if the model only considers the current waiting time.

As the Figure 19, although the current waiting time of 𝐶𝑀₁ is shorter than 𝐶𝑀₂, the number of users in 𝐶𝑀₁ is much greater than 𝐶𝑀₂. As time ticks down, the probability of bursty traffic of messages in 𝐶𝑀₁ is very high. We call this Incubation Period Phenomenon.

Figure 19. Incubation Period Phenomenon

5.2.3 Foresight-SeeFuND Task Assignment Algorithm

The Incubation Period Phenomenon happens when the load balancer assigns an amount of users to the same CM with the shortest expected waiting time in a short time.

In order to avoid the bursty traffic as time goes, the task assignment policy which can solve the incubation period phenomenon fundamentally named Foresight-SeeFuND (F-Foresight-SeeFuND) Algorithm is designed as follows:

1. CM with the highest number of user connections is ruled out to downsize the probability of facing the occurrence of bursty traffic as time ticks down.

2. Within the rest of CMs, CM with the shortest 𝑊_𝑛𝑒𝑤^𝑘 is selected by load balancer to connect to incoming user.

Both M/G/1 queuing model and method of moving average are incorporated in the task assignment policy to tackle the problem of network instability. If there is a huge

在文檔中以任務分配解決即時金融服務中突發流量及網路不穩定問題 - 政大學術集成 (頁 26-0)