Traffic Load Balancing Schemes for Devolved Controllers in Mega Data Centers

(1)

Traffic Load Balancing Schemes for Devolved Controllers in Mega Data Centers

Xiaofeng Gao, Member, IEEE, Linghe Kong, Weichen Li, Wanchao Liang, Yuxiang Chen, and Guihai Chen, Senior Member, IEEE

Abstract—In most existing cloud services, a centralized controller is used for resource management and coordination. However, such infrastructure is gradually not sufficient to meet the rapid growth of mega data centers. In recent literature, a new approach named devolved controller was proposed for scalability concern. This approach splits the whole network into several regions, each with one controller to monitor and reroute a portion of the flows. This technique alleviates the problem of an overloaded single controller, but brings other problems such as unbalanced work load among controllers and reconfiguration complexities. In this paper, we make an exploration on the usage of devolved controllers for mega data centers, and design some new schemes to overcome these

shortcomings and improve the performance of the system. We first formulate Load Balancing problem for Devolved Controllers (LBDC) in data centers, and prove that it is NP-complete. We then design an f-approximation for LBDC, where f is the largest number of potential controllers for a switch in the network. Furthermore, we propose both centralized and distributed greedy approaches to solve the LBDC problem effectively. The numerical results validate the efficiency of our schemes, which can become a solution to monitoring, managing, and coordinating mega data centers with multiple controllers working together.

Ç 1 I

NTRODUCTION

I

^N recent years, data center has emerged as a common infrastructure that holds thousands of servers and sup- ports many cloud applications and services such as scien- tific computing, group collaboration, storage, financial applications, etc. This fast proliferation of cloud computing has promoted a rapid growth of mega data centers used for commercial purposes. Companies such as Amazon, Cisco, Google, and Microsoft have made huge investments to improve Data Center Networks (DCNs).

Typically, a DCN uses a centralized controller to monitor the global network status, manage resources and update routing information. For instance, Hedera [1] and SPAIN [2]

both adopt such a centralized controller to aggregate the traffic statistics and reroute the flows for better load balancing.

However, for large-scale DCN with thousands of racks (usually in a mega data center), the utilization of a centralized controller suffers from many problems such as the issues of scalability[3] and availability. Driven by the unprecedent objectives of improving the performance and scale of DCNs, researchers try to deploy multiple controllers in such networks [4], [5], [6], [7], [8]. The concept of devolved controllers is thereby introduced for the first time in [4], in which they used dynamic flow [5] to illustrate the detailed configuration.

Devolved controllers are a set of controllers that collaborate as

an single omniscient controller, as a similar scheme in [9].

However, none of the controllers has the complete information of the whole network. Instead, every controller only maintains a portion of the pairwise multipath information beforehand, thus reducing the workload significantly.

Recently, software-defined networking (SDN) as proposed by OpenFlow [10] has been touted as one of the most promising solutions for future Internet. SDN is character- ized by two distinguished features: decoupling the control plane from the data plane and providing programmability for network application development [11]. From these features we can divide the DCN flow control schemes into two layers: the lower layer focuses on traffic management and virtual machine (VM) migrations, which could relieve the intensive traffic in hot spots; the upper layer coordinates the control rights of switches among controllers, which deals with the load imbalance problem in a hierarchical manner.

Combining the two layers together, we could better improve the system performance and reduce the load imbalance problem greatly.

For the lower layer control, there are mature and well- developed methods to handle the flow control and VM migration at present [12], [13], [14], [15]. While for the upper layer control, managing the DCNs by devolved controllers gradually becomes a hot topic in recent years due to the expansion of the scale of DCNs. Similarly, if switches are relatively busy regionally, then the controller monitoring this region becomes a hot spot, which could be harmful for the system. Many relevant studies emphasis on the imbalanced load problem for devolved controllers [4], [11], [16], but none of them give a clear formulation of controller imbalance problem and analyze the performance of their solutions. This leads to our concern on the imbalanced load issue for devolved controllers to better control the traffic and manage the network.

X. Gao, L. Kong, W. Li, and G. Chen are with the Department of Computer Science and Engineering, Shanghai Key Laboratory of Scalable Computing and Systems, Shanghai Jiao Tong University, Shanghai 200240, China.

E-mail: {gao-xf, linghe.kong, eizo.lee, gchen}@cs.sjtu.edu.cn.

W. Liang and Y. Chen are with the School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15289. E-mail: {wanchaol, yuxiang1}

@cs.sjtu.edu.cn.

Manuscript received 2 July 2015; revised 1 June 2016; accepted 1 June 2016.

Date of publication 9 June 2016; date of current version 18 Jan. 2017.

Recommended for acceptance by X. Wang.

For information on obtaining reprints of this article, please send e-mail to:

[email protected], and reference the Digital Object Identifier below.

Digital Object Identifier no. 10.1109/TPDS.2016.2579622

1045-9219ß 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

(2)

Motivated by these concerns, in this paper we propose a novel scheme to manage devolved controllers. In our scheme, each controller monitors the traffics of a part of the switches locally. When traffic load imbalance occurs, some of them will migrate a portion of their monitored work to other controllers so that the workload can be kept balanced dynamically. We define this problem as Load Balancing problem for Devolved Controllers (LBDC).

We prove that LBDC is NP-complete, which might not be easily solved within polynomial time. Then we design multiple solutions for LBDC, including a linear programming with rounding approximation, three centralized greedy algorithms, and one distributed greedy algorithm. Using these solutions, we can dynamically balance the traffic load among controllers. Such methods can reduce the occurrence of traffic hot spots significantly, which will degrade network performance. These schemes can also improve the availability and throughput of DCN, supporting horizontal scaling and enhancing responsiveness of clients’ requests. In all, the main con- tributions of this paper are as follows:

1) We design and implement a traffic load balancing scheme using devolved controllers, which eliminates the scalability problem and balances the traffic load among multiple controllers. All these controllers are configured based on their physical placements, which is more realistic and makes the whole network more effective and reliable.

2) We prove the NP-completeness of LBDC, and design an f-approximation algorithm to obtain the solution.

We also come up with both centralized and distributed heuristics for workload migration between controllers in dynamic situations. The distributed algorithm is scalable, stable, and more appropriate for real-world applications, especially for large-scale DCNs.

3) We evaluate our algorithms with various experiments. Numerical results validate our design’s efficiency. To the best of our knowledge, we are the first to discuss workload balancing problem among multi-controllers in DCNs, which has both theoreti- cal and practical significance.

This paper is the extended version of our conference version [17]. Based on the short conference version, we add a randomized rounding for the linear programming, as well as two novel centralize migration algorithms under limited conditions. Additionally, we develop a new evaluation section and obtain more reliable and precise results by various numerical experiments.

The rest of the paper is organized as follows. Section 2 presents the system architecture and problem statement;

Sections 3 and 4 give our solutions to LBDC. Section 5 exhib- its our performance evaluation and proves the effectiveness of our algorithms. Section 6 introduces the related works;

Finally, Section 7 concludes the paper.

2 P

ROBLEM

S

TATEMENT

Traffic in DCN can be considered as Virtual Machine communication. VMs in different servers collaborate with each other to complete designated tasks. In order to

communicate between VMs, communication flow will go through several switches.

Based on the concept of OpenFlow [10], there is a flow table in each switch, storing the flow entries to be used in routing. One responsibility of a controller is to modify these flow tables when communication occurs. Every controller has a corresponding routing component and it may be com- posed of several hierarchical switches, including Top of Rack (TOR) Switches, Aggregation Switches, and Core Switches. These switches are used for communication within the data center. Furthermore, every rack has a server called designated server [18], which is responsible for aggre- gating and processing the network statistics for the rack. It is also in charge of sending the summarized traffic matrices to the network controller, using a mapping program which converts the traffic of this rack (server-to-server data) into ToR-to-ToR messages. Once a controller receives these data, it will allocate them to a routing component which com- putes the flow reroute and replies to the new flow messages sent to the controller. Then the controller installs these route information to all associated switches by modifying their flow tables. Since this paper is not concerned with routing, we omit the details of table computing and flow rerouting.

Now we will define our problem formally. In a typical DCN, denote si as the ith switch, with the corresponding traffic weight wðsiÞ, which is defined precisely as the number of out-going flows. Note that this weight does not include the communication within the ToRs. Next, given n switches S¼ fs1; ; sng with their weights wðsiÞ and m controllers C¼ fc1; ; cmg, we want to make a weighted m-partition for switches such that each controller will monitor a subset of switches. The weight of a controller wðciÞ is the weight sum of its monitored switches. Due to physical limitations, assume every si has a potential controller set PCðsiÞ and it can only be monitored by controller in PCðsiÞ. Every cihas a potential switch set PSðciÞ and it can only control switches in PSðciÞ. After the partition, the real controller of siis denoted by rcðsiÞ and the real switch subset of ciis denoted by RSðciÞ.

The symbols used in this paper are listed in Table 1.

To keep the performance of network management, each controller should finally have almost the same amount of workload. Otherwise, if the hot switches always require routing information from the same controller, it will become the bottleneck of the network. To precisely quantify the balancing performance among devolved controllers, we define Standard Deviation of the partitions’ weights as the metric, denoted by

TABLE 1 Definition of Terms

Term Definition

S; si switch set with n switches: S={s1; . . . ; sn} wðsiÞ weight of si, as the no. of out-going flows.

PCðsiÞ potential controllers set of the ith switch.

rcðsiÞ the real controller of the ith switch.

C; ci controller set with m controllers: C = {c1; . . . ; cm} wðciÞ weight of ci, as the sum of RSðciÞ’s weight.

PSðciÞ potential switches set of the ith controller.

RSðciÞ real Switches set of the ith controller.

ANðciÞ adjacent node set (1-hop neighbors) of ci.

(3)

s ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1 m

P_m

i¼1ðwðciÞ wðcÞÞ² q

, where wðcÞ is the average weight of all controllers. If the traffic flow varies as the system running, the weight of controller ci may grow explosively, making it unbalanced comparing with other controllers. Then in this condition, we must regionally migrate some switches in RSðciÞ to other available controllers, in order to reduce its workload and keep the whole network traffic balanced.

Then our problem becomes balancing the traffic load among m partitions in real time environment, and migrating switches among controllers when the balance is broken. We define this problem as Load Balancing problem for Devolved Controllers. In our scheme, each controller can dynamically migrate switches to or receive switches from logically adjacent controllers to keep the traffic load balanced.

Fig. 1 illustrates the migration pattern. Here Controller cj

dominates 17 switches (as red switches) and Controller ci

dominates 13 switches (as blue switches). Since the traffic between ci and cj is unbalanced, cj is migrating one of its switches to ci.

Let xij ¼ 1 If ci monitors sj

0 otherwise

. Then the LBDC problem can be further formulated as the following programming:

min

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1

m X^m

i¼1

Xⁿ

j¼1

wðsjÞ xij wðcÞ

!2

vu

ut (1)

s:t: wðcÞ ¼ 1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ xij (2) X^m

i¼1

xij¼ 1; 81 j n (3)

xij ¼ 0; if sj62 PSðciÞ or ci62 PCðsjÞ; 8i; j (4) xij2 f0; 1g 8i; j: (5) Here, Eqn. (1) is the objective standard deviation. Eqn. (2) calculates the average weight of all controllers. Eqn. (3) means that each switch should be monitored by exactly one controller. Eqn. (4) is the regional constraints, and Eqn. (5) is the integer constraints.

Theorem 1. LBDC is NP complete.

Proof. We will prove the NP completeness of LBDC by considering a decision version of the problem, and showing a reduction from PARTITION problem [19].

An instance of PARTITION is: given a finite set A and

a sizeðaÞ 2Z^þ for each a 2 A, is there a subset A⁰ A such thatP

a2A⁰sizeðaÞ ¼P

a2AnA⁰sizeðaÞ? Now we construct an instance of LBDC. In this instance there are two controllers c1, c2 and jAj switches. Each switch sa

represents an element a2 A, with weight wðsaÞ ¼ sizeðaÞ. Both controllers can control every switch in the network (PSðc1Þ ¼ PSðc2Þ ¼ fsaj a 2 Ag).

Then, given a YES solution A⁰ for PARTITION, we have a solution RSðc1Þ ¼ fsaj a 2 A⁰g, RSðc2Þ ¼ fsaj a 2 AnA⁰g with s ¼ 0. The reverse part is trivial. The reductions can be done within polynomial time, which

completes the proof. tu

Next we presents our solutions for the LBDC. We implement the schemes within OpenFlow framework, which makes the system comparatively easy to configure and implement. It changes the devolved controllers from a mathematical model into an implementable prototype.

Furthermore, our schemes are topology free, which is scalable for any DCN topology such as Fat-Tree, BCube, Portland, etc.

3 L

INEAR

P

ROGRAMMING AND

R

OUNDING

Given the traffic status of the a current DCN with devolved controllers, we can solve the LBDC problem using the above programming. To simplify this programming, we will then transfer it into a similar integer programming. Firstly, we can convert the standard deviation to average of absolute values:

min 1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ xij wðcÞ

: (6)

We rewrite Eqn. (6), and obtain an integer programming as follows:

min 1

m X^m

i¼1

yi (7)

s:t: yiXⁿ

j¼1

wðsjÞ xij wðcÞ (8)

yi wðcÞ Xⁿ

j¼1

wðsjÞ xij (9)

wðcÞ ¼ 1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ xij (10)

X^m

i¼1

xij¼ 1; 81 j n (11)

xij¼ 0; if sj62 PSðciÞ or ci62 PCðsjÞ; 8i; j (12)

xij2 f0; 1g 8i; j: (13)

In general, integer programmings may not be easily solved in polynomial time, so we adopt relaxation to transfer our integer programming into a linear programming (LP).

Then we can acquire a fractional solution and then round it

Fig. 1. An example of regional balancing migration.

(4)

to a feasible solution of the original programming. To obtain the linear programming, we replace Eqn. (13) with xij 0 ð8i; jÞ.

After solving this LP, we can discover a feasible solution to LBDC by a deterministic rounding [20], which is stated in Algorithm 1.

Algorithm 1.Deterministic Rounding (LBDC-DR) 1 foreach switch sjdo

2 Search the solution space of LP:

3 Let ‘ ¼ arg maxifxijj 1 i mg;

4 if 9several maximal xijthen

5 Let ‘ ¼ arg minifwðciÞ j each max xijg 6 Round x‘j¼ 1;

7 for ci6¼ c‘do 8 Round xij¼ 0;

For instance, if a switch sjhas x1j¼ 0:2; x2j¼ 0:7; x3j¼ 0:1 in the solution space of LP, then according to Algo- rithm 1, we can round x2j¼ x‘j¼ 1, and x1j¼ x3j¼ 0.

Next, we prove that this solution is feasible for LBDC.

Theorem 2. LBDC-DR (Algorithm 1) results in a feasible solution for the integer programming of LBDC.

Proof. According to LBDC-DR, for each sj, we only round the maximum xij¼ 1; 81 i m, and all other xij’s are equal to 0. Then each switch is monitored by only one controller and no switches are in the idle state. Thus we can get a feasible solution for the integer

programming. tu

Now let us analyze the performance of LBDC-DR. We define Z, Z^LP, and Z^R as the solutions of the integer programming, the solution of the linear programming, and the solution after the rounding process respectively.

Then define f as the maximum number of controllers in which any switch potentially appears. More formally, f¼ max_i¼1;...;njPCðsiÞj.

We claim that LBDC-DR is an f-approximation. To prove it, we first prove the following two lemmas.

Lemma 1. wðcÞ^LP ¼ wðcÞ¼ wðcÞ^R

Proof: From the definition of the original wðcÞ, the ideal weight of each controller is the sum of the weight of all switches divided by the number of controllers. This definition is suited for all the solution space, thus we can conclude that wðcÞ^LP ¼ wðcÞ¼ wðcÞ^R¼_m¹P_n

i¼1wðsiÞ. tu Lemma 2. x^Rij x^LP_ij f

Proof. We have the constraint P_m

i¼1x^LP_ij ¼ 1 ð81 j nÞ.

Also according to LBDC-DR, x^LP_‘j is the largest of all x^LP_ij ð81 i mÞ, then by the Pigeonhole principle, we must have x^LP_‘j f 1. Because for each switch sj, x^R_‘j equals to 1 and others equal to zero, which is less than or equal to the corresponding LP solution times the f factor.

Then for any controller ci, we have x^R_ij x^LP_ij f. tu According to all above lemmas, we can then obtain the following theorem:

Theorem 3. LBDC-DR is an f-approximation algorithm.

Proof. Since the linear programming is a relaxation of the integer programming, we have Z^LP Z. Also we have Z Z^Rbecause the solution of LBDC-DR is feasible according to Theorem 2, while Zdenotes the optimal solution.

Because wðcÞ represents the ideal weight of each controller, it must be the same in all the solutions according to Lemma 1. Therefore we let w ¼ wðcÞ. From Z^LP Z we can derive

1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x^LP_ij w

1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x_ij w

: Since we already know the inequality jxj jyj jx yj jxj þ jyj, we can get the following relationship:

1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x^LP_ij

1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x_ij

þ 2w:

Then the approximation ratio can be obtained by the following inequations:

1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x^R_ij w

1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x^R_ij

þ w

!

1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x^LP_ij f

þ w

f 1 m

X^m

i¼1

Xⁿ

j¼1

wðsjÞ x_ij

þ ð1 þ 2fÞw

¼ f OPT þ ð1 þ 2fÞw:

Thus LBDC-DR is an f-approximation. tu Another idea for rounding an optimal fractional solution is to view the fractions as probabilities, flipping coins with these biases and rounding accordingly. We will show how this idea leads to an O(log n) factor randomized approximation for the LBDC problem. We then present our LBDC-Randomized Rounding (LBDC-RR) algorithm as described below.

First, we claim that our LBDC problem can be described in another way as the definition and properties of set cover:

Given a universe U of n switch elements, S is a collection of subsets of U, and S = {S1; . . . ; Sn}. And there is a cost assignment function c : S !Z^þ. Find the subcollection of S with the minimum deviation that covers all the switches of the universal switch set U.

We will show that each switch element is covered with constant probability by the controllers with a specific switch set, which is picked by this process. Repeating this process Oðlog nÞ times, and picking a subset of switches if it is cho- sen in any of the iterations, we get a set cover with high probability, by a standard coupon collector argument. The expected minimum deviation of cover (or say controller- switch matching) picked in this way is Oðlog nÞ OPTf Oðlog nÞ OPT , where OPTf is the cost of an optimal solution to the LP-relaxation.

Algorithm 2 shows the formal description of LBDC-RR.

(5)

Algorithm 2.Randomized Rounding (LBDC-RR) 1 Let x ¼ p be an optimal solution to the LP;

2 foreach set Si2 S do

3 Pick Siwith probability xS_i

4 repeat " get c log n subcollections 5 Pick a subcollection as a min-cover

6 until execute c log n times

7 Compute the union of subcollections in C.

Next let us compute the probability that a switch element a2 U is covered by C. Suppose that a occurs in k sets of S.

Let the probabilities associated with these sets be p1; . . . ; pk. Since a is fractionally covered in the optimal solution, p₁þ p2þ þ pk 1. Using elementary calculus, it is easy to show that under this condition, the probability that a is coverd by C is minimized when each of the pi0s is 1/k. Thus,

Pr½a is covered by C 1 1 1 k

k

1 1 e; where e is the base of natural logarithms. Hence each element is covered with constant probability by C.

To get a complete switch set cover, we can independently pick c log n such subcollections. And then we compute their union, say C⁰, where c is a constant such that ð¹_eÞ^{c log n}_4n¹.

Then we can obtain the following probability,

½Pr½a is not covered by C⁰ 1 e

c log n

1 4n: Summing up all switch elements a 2 U, we get

Pr½C⁰ is not a valid switch set cover n 1 4n1

4: Therefore the LBDC-RR algorithm is efficient and we can solve the LBDC problem using linear programming and randomized rounding.

4 A

LGORITHM

D

ESIGN

Using Linear programming and rounding, we can perfectly solve LBDC theoretically. However, it is usually time consuming and impractical to solve an LP in real-world applications. Thus, designing efficient and practical heuristics for real systems is essential. In this section, we will propose a centralized and a distributed greedy algorithm for switch migration, when the traffic load becomes unbalanced among the controllers. We then describe OpenFlow based migration protocols that we use in this system.

4.1 Centralized Migration

Centralized Migration is split up into two phases. The first phase is used for configuring and initializing the DCN. As the traffic load changes due to various applications, we have to come to the second phase for dynamical migration among devolved controllers.

Fig. 2 illustrates the general workflow of Centralized Migration, which includes Centralized Initialization and Centralized Regional Balanced Migration.

Centralized Initialization. First we need to initialize the current DCN, and assign switches to the controllers in its

potential controller set. We design a centralized initialization algorithm (LBDC-CI) for the initialization process. In order to get rid of the dilemma where we have to select from con- flict switches or controllers, we first present the Break Tie Law.

Break Tie Law. (1) When choosing sifrom S, we select the one with the largest weight. If several switches have the same weight, the one with the smallest jPCðsiÞj is preferred.

If there are still several candidates, we randomly choose one. (2) When choosing ci from C, we select the one with the minimum weight. If several controllers have the same weight, the one with the smallest jRSðciÞj is preferred. If there are still several candidates, we choose the closer controller by physical distance. Finally, if we still cannot make a decision, just randomly choose one.

Then we design LBDC-CI as shown in Algorithm 3.

Algorithm 3.Centralized Initialization (LBDC-CI) Input: S with wðsiÞ; C with wðciÞ;

Output: An m-Partition of S to C 1 RemList = { s1, s2, . . . , sn};

3 while RemList 6¼ ; do 4 Pick sifrom RemList;

5 Let ‘ ¼ arg minjfwðcjÞ j cj2 PCðsiÞg;

6 Assign sito c‘(by break Tie Law);

7 Remove sifrom RemList;

LBDC-CI needs to search the RemList to assign the switches. This process takes running time OðnÞ. While loop will be executed once for each switch in RemList, which takes OðmÞ. Hence in the worst case the running time is OðmnÞ. If we use a priority heap to store the RemList, we can improve the performance and reduce the overall running time to Oðm log nÞ.

As the system runs, traffic load may vary frequently and will influence the balanced status among devolved controllers. Correspondingly, we have to begin the second phase and design the centralized migration algorithm (LBDC-CM) to alleviate the situation.

Centralized Regional Balanced Migration. During the migration process, we must assess when the controller needs to execute a migration. Thus we come up with a threshold and an effluence to judge the traffic load balancing status of the controllers. Here we define Thd as the threshold and Efn as the effluence. If the workload of a controller is lower than or equal to Thd, it becomes relatively idle and available to

Fig. 2. Dynamic load balancing workflow of LBDC.

(6)

receive more switches migrated from those controllers with workload overhead. If the workload of a controller is higher than Efn, it is in an overload status and should assign its switches to other idle controllers. Some measurement studies [21] of data center traffic have shown that data center traffic is expected to be linear. Thus we set the threshold according to the current traffic sample and the historical records, by imitating Round-Trip Time (RTT) and Timeout of TCP [22]. This linear expectation uses two constant weighting factors a and b, depending on the traffic features of the data center, where 0 a 1 and b > 1.

(1) Naive LBDC-CM. We will first raise a naive algorithm for LBDC-CM. We will run naive LBDC-CM periodically and divide the running time of the system into several rounds. We use Avglastand Avgnowto represent the average workload of the last sample round and the current sample round. These two parameters are used together to decide when to start and stop the migration. In each round, we sample the current weight of each controller, and calculate Avgnow¼Pm

i¼1wðciÞ=m. In all, the Linear Expectation can be computed as follows:

Thd¼ a Avgnowþ ð1 aÞ Avglast

Efn¼ b Thd:

(

(14)

The core principle of LBDC-CM is migrating the heaviest switch to the lightest controller greedily. Algorithm 4 describes the details. Note that ANðciÞ denotes the neighbor set of ci.

Algorithm 4.Centralized Migration (LBDC-CM) Input: S with w⁰ðsiÞ; C with w⁰ðciÞ;

PendList¼ OverList ¼ f;g;

1 Step 1: Add ci! OverList if w⁰ðciÞ > Efn;

2 Step 2: Find cmof max weight in OverList;

3 if 9cn2 ANðcmÞ : w⁰ðcnÞ < Thd then 4 repeat

5 Pick sm2 RSðcmÞ of max weight;

6 if 9cf2 ANðcmÞ \ PCðsmÞ : w⁰ðcfÞ < Thd thenSend sm! cf

7 elseIgnore the current smin cm

8 until w⁰ðcmÞ Thd or all w⁰ðcfÞ Thd;

9 if w⁰ðcmÞ > Efn then move cmto PendList 10 elseremove cmfrom OverList

11 else

12 Move cmfrom OverList to PendList;

13 Step 3: Repeat Step 2 until OverList ¼ f;g;

14 Let OverList ¼ PendList, Repeat Step 2 until PendList becomes stable;

15 Step 4: Now PendList has several connected components CCið1 i jCCjÞ;

16 foreach CCi2 CC do 17 Search the S

c_j2CCiANðcjÞ;

18 Compute avglocal¼^w_jCC⁰^ðCCⁱ^[ANðCCⁱ^ÞÞ

ijþjANðCC_iÞj; 19 while w⁰ðcjÞ g avglocal: cj2 CCido 20 Migrate smax2 RSðcjÞ to cmin2 ANðCCiÞ;

21 remove cj2 CCifrom PendList;

22 Step 5: Repeat Step 4 until PendList is stable.

The naive LBDC-CM consists of five steps. In Step 2, it searches the OverList to find cm, which takes OðmÞ.

Next, it repeatedly migrates switches from the OverList to corresponding controllers, which takes OðmnÞ. Step 3 invokes Step 2 for several times until the OverList is empty and makes the PendList become stable, which takes Oðm²nÞ. Step 4 and Step 5 balance the PendList locally as Step 2 and 3. In the worst case, the running time is Oðm²nÞ. By using a priority heap to store the OverListand PendList, we can reduce the time complexity to Oðmn log mÞ.

(2) Limited LBDC-CM. In our naive version, we simply suppose that all controllers have unlimited processing abilities. However, in real conditions, the performance of each controller will vary a lot. Thus, although naive LBDC-CM balances every controller with almost the same traffic load after several rounds, some of them will work in an overloaded state. For example, consider the following condition:

there are three controllers c1, c2, c3. The maximum capacity for c1is , for c2is 2 and for c3is 4. The total weight of all switches in this system is 6. If our naive LBDC-CM works perfectly, then each controller will have a load of 2 in the end. Definitely, c1 works in an overloaded status, and will become the bottleneck of the system. Yet c3only makes use of 50 percent of its maximum abilities. Thus in fact, the naive LBDC-CM only balances the value of load among devolved controllers, instead of balancing the performance of processing traffic load.

Correspondingly, we design an improved algorithm as limited LBDC-CM. To reconfigure the system when it is unbalanced, we still need a threshold parameter and an effluence parameter for each controller. But now different controllers will have different parameter values, and we use two sets to store them: ThdList = {Thd1; . . . ; Thdm} and EfnList = {Efn1; . . . ; Efnm}. For controller ci, we use Desⁱ_now to denote its deserved workload of the current round, and use Desⁱ_last to denote the deserved workload of the last round. Then these parameters are computed as follows:

Desⁱ_now¼ Pn

i¼1w⁰ðsiÞ P_m

j¼1wmðcjÞ wmðciÞ Thdi¼ a Desⁱ_nowþ ð1 aÞ Desⁱ_last Efni¼ b Thdi:

8>

>>

<

>>

>:

(15)

Here the maximum load that controller ci can hold is denoted as wmðciÞ. Meanwhile, we modify the definition of standard deviation, and define s⁰as Relative Weight Devia- tion: s⁰¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

1 m

P_m

i¼1

P_n

j¼1wðsjÞ xij Desⁱ_now

2

r

. We believe this reference index is more appropriate. We use Desⁱ_nowin Relative Weight Deviation to evaluate limited LBDC-CM and LBDC-CM with switch priority. We use Avgnow to replace Desⁱ_now in RWD to evaluate naive LBDC-CM and LBDC-DM.

According to Eqn. (15), the procedure of the limited LBDC-CM is very similar as the naive LBDC-CM in Algorithm 4. The only difference comes from the comparison steps, when to judge whether a controller is overloaded, we need to compare w⁰ðciÞ to its local Efni

(7)

and Thdi. Furthermore, in Step 4 (Line 18 of Algo- rithm 4), we need to calculate local ¼_w^w⁰^ðCCⁱ^[ANðCCⁱ^ÞÞ

mðCC_i[ANðCC_iÞÞ, and consider candidate controller cj if w⁰ðcjÞ > g local wmðcjÞ instead of g avglocal.

Limited LBDC-CM uses the current load ratio of each controller other than the value of the current weight, to judge whether the devolved controllers are unbalanced.

Thus, we only need to calculate the average percentage of resources utilized in the system, and migrating switches from the controllers that have high percentages to those with low percentages. The time complexity is the same as naive LBDC-CM, which takes Oðm²nÞ, and can be reduced to Oðmn log mÞ using priority heap. For space complexity, we need to use several lists to store the following parameters: the weight of a switch, the current of a controller, the maximum capacity of a controller, the threshold and effluence of each controller, as well as the PendList and the OverList. Each of them requires a linear array to store, which takes OðnÞ. We also need two matrices to store the potential mapping and real mapping between controllers and switches, which takes Oðn²Þ. Thus, the space complexity is Oðn²Þ.

(3) LBDC-CM with Switch Priority. Our scheme of limited LBDC-CM can work well in a comparative intense structure. That is to say, if the distance between a switch and all its potential controllers are close enough, so that migrating switch si from controller c1 to controller c2

will not influence the processing speed of messages, then limited LBDC-CM will have a good performance. How- ever, in some distributed data centers that have a very sparse structure, it is better to attach a switch to its nearby controllers. Meanwhile, as we have mentioned, the performance of controllers in the network system may be very different. Some of the controllers may have strong computing capacities, and thus can process messages in a higher speed. In real network systems, sometimes we hope certain messages or certain areas can have a higher priority in the whole structure, and we want to allocate switches in this region to those strong controllers to increase the value of the system.

Thus, though for a certain switch si, it can be attached to any controller cj in its potential controller set PCðsiÞ, the performance, or the value of the whole system may vary according to the real mapping strategy. If the value we get for si monitored by c1 is u, and for si monitored by c2 is 2u, then its better to distribute si to c2, if the current load of both controllers are below their thresholds.

Thus we come up with LBDC-CM with switch priorities.

In this scheme, each switch has a value list, which stores the value of each mapping between this switch and its potential controllers. We want to balance the traffic load of the network and make the whole value as large as possible. In LBDC-CM, we use vij to denote the value we can get by attaching switch si to controller cj. These values are stored in a matrix Value, and if cj is not in the potential controller set of si, then vij = 0. We also consider the maximum capacity of each controller as we did in the limited LBDC-CM.

The implementation of this algorithm is quite similar to limited LBDC, except that we changed the migration

scheme used in Step 2 of limited LBDC-CM, which is shown in Algorithm 5.

Algorithm 5.LBDC-CM with Switch Priority 1 Step 2: Find cm2 OverList with maxw^w_m⁰^ðcðc^mm^ÞÞ; 2 if 9cn2 ANðcmÞ : w⁰ðcnÞ < Thdnthen 3 repeat

4 if 9cf2 ANðcmÞ: w⁰ðcfÞ < Thdfthen 5 Sort PSðcmÞ by vif: si2 PSðcmÞ;

6 Pick skwith max vkfin cm, and pick max skto break tie;

7 Send sk! cf;

8 until w⁰ðcmÞ Thdmor all w⁰ðcfÞ Thdf; 9 if w⁰ðcmÞ > Efnmthenmove cmto PendList;

10 elseremove cmfrom OverList 11 else

12 Move cmfrom OverList to PendList;

In this scheme, we add the process of sorting the switch list according to the value matrix, which will take Oðlog nÞ if we use heap sorting. Thus the time complexity is Oðn log m log nÞ if we use a priority heap to store the PendList and the OverList. And the space complexity is still Oðn²Þ since we need some matrices to store the value and the mapping relations.

4.2 Distributed Migration

The centralized algorithm is sometimes unrealistic for real-world applications, especially for large data center with regional controller. It is time consuming and com- plicated for a devolved controller to get the global information of the whole system. Thus it is natural to design a practical and reliable distributed algorithm [23]. We assume a synchronous environment to deploy our algorithm. For the distributed algorithm, it is still divided into two phases.

Distributed Initialization. During this phase, we assign each switch a corresponding controller randomly. By sending control messages to the controller’s potential switch set, the controller can determine the correct assignment. Algo- rithm 6 shows the distributed initialization process.

Algorithm 6.Distributed Initialization (LBDC-DI) 1 Send “C^ONTROL” message to my own PSðcmyÞ

2 si reply the first “CÔNTROL” message with “YÊS”, all other messages after that with “NÔ”.

3 Move siwith “Y^ES” from PSðcmyÞ to RSðcmyÞ.

4 Wait until all the switches in PSðcmyÞ reply, and then terminate.

The correctness of LBDC-DI is easy to check. After initialization, we then design the distributed migration algorithm (LBDC-DM) to balance the workload of the system dynamically.

Distributed Regional Balanced Migration. In the second phase, the controller uses the threshold and the effluence to judge its status and decide whether it should start the migration. Since in a distributed system, a controller can only obtain the information of its neighborhood, the threshold is not a global one that suits for all the

(8)

controllers, but an independent value which is calculated by each controller locally. Also the algorithm runs periodically for several rounds. In each round, each controller samples ANðciÞ and applies the Linear Expectation again:

Avg¼ P

c_k2ANðciÞþciwðc_kÞ jANðciÞjþ1

Thd¼ a Avgnowþ ð1 aÞ Avglast

Efn¼ b Thd:

8>

<

>: (16)

LBDC-DM aims at monitoring the traffic status of itself by comparing current load with its threshold. When the traffic degree is larger than Efn, it enters the sending state and initiates a double-commit transaction to transfer heavy switches to nearby nodes.

Algorithm 7 shows the distributed migration procedure.

Algorithm 7.Distributed Migration (LBDC-DM) Sending Mode:(when w⁰ðcmyÞ Efn)

1 if 9ci2 ANðCmyÞ in receiving or idle then 2 add ci! RList (receiving > idle).

3 repeat

4 Pick smaxwith max weight, refer PCðsmaxÞ, find cj2 RList with min weight, send

“^HELP½cmy; smax ” to cj, then check response:

5 ifresponse=“A^CC” then 6 send “M^IG½cmy; smax ” to cj

7 else ifresponse=“R^EJ”then

8 remove cjfrom RList, find next cj, send

“^HELP” again, check response.

9 Check response, delete smaxwhen receiving

“^CONFIRM” message, terminate.

10 until w⁰ðcmyÞ Efn;

Receiving Mode:(when w⁰ðcmyÞ Thd) 11 When receiving “^HELP” messages:

12 repeat

13 receive switches for cjand return “A^CC”;

14 until wðcjÞ þ smax Thd;

15 Now all “^HELP” messages will reply “R^EJ” 16 When receiving “M^IG” message:

17 smax! cj, send back “^CONFIRM” message;

Idle Mode:(when Thd w⁰ðcmyÞ Efn) 18 When receiving “^HELP” message:

19 repeat

20 receive switches for cjand return “A^CC”;

21 until wðcjÞ þ smax > Efn;

22 When receiving “M^IG”, migrate as above;

The main difference between the centralized and the distributed migration is that the former can get information in a global view and make better decisions, but it will also cause more processing times and will become potential bottleneck of the system. On the contrary, for the controllers in the distributed version, each controller will only collect information from its neighborhood and can only make proper migrations within this area. Though the distributed version cannot obtain a global optimal balancing status, it is more practical to deploy in real systems. Meanwhile, it can efficiently avoid the problem in the centralized scheme that the collapse or mistake of the central processor will affect problem of the system.

Their difference is also shown in the definition of the threshold (Thd). In the centralized version, the threshold is affected by the utilizing ratio of the whole system, which is the same for each controller in the centralized scheme.

While in the distributed version, the threshold of each controller is calculated by its local information instead of the global information, and the deserved utilizing ratio of each controller is actually different from each other.

By using our distributed scheme, for conditions shown in Fig. 1, controller ciand controller cjwill get the information of each other, calculate its Thd and Efn value, and decide its status. If controller ciis in the sending mode and controller cjis in the receiving mode, then ci will migrate some of its dominating switches to cjaccording to Algorithm 7.

4.3 OpenFlow Based Migration Protocol

To maintain a well-balanced operating mode when a peak flow appears, switches should change the roles of their current controllers while controllers should change their roles by sending Role-Request messages to the switches. These operations require the system to perform a switch migration operation. However, there is no such mechanism provided in the OpenFlow standard. Open- Flow 1.3 defines three operational modes for a controller:

master, slave, and equal. Both master and equal controllers can modify switch state and receive asynchronous messages from the switch. Next, we design a specific protocol to migrate a switch from its initial controller to a new controller.

It is assumed that we are not able to manipulate the switch in our migration protocol design, while it is techni- cally feasible to update the OpenFlow standard to implement our scheme. However, there are two additional issues.

First, the OpenFlow standard clearly states that a switch may process messages not necessarily in the same order as they are received, mainly to allow multi-threaded imple- mentations. Second, the standard does not specify explicitly whether the order of messages transmitted by the switch remains consistent between two controllers that are in master or equal mode. We need this assumption for our protocol to work, since allowing arbitrary reordering of messages between two controllers will make an already hard problem significantly harder.

Our protocol is built on the key idea that we need to first create a single trigger event to stop message processing in the first controller and start a same message in the second one. We can exploit the fact that Flow- Removed messages are transmitted to all controllers operating in the equal mode. We therefore simply insert a dummy flow into the switch from the first controller and then remove the flow, which will provide a single trigger event to both the controllers in equal mode to signal handoff. Our proposed migration protocol for migrating switch sm from initial controller ci to target controller cj

works in four phases as shown below.

Phase 1. Change the role of target cj to equal mode.

Here, controller cj is first transitioned to the equal mode for switch sm. Initially master ci initiates this phase by sending a start migration message to cj on the controller- to-controller channel. cj sends the Role-Request message to sm informing that it is an equal. After cj receives a

(9)

Role-Reply message from sm, it informs the initial master ci that its role change is completed. Since cj changes its role to equal, it can receive asynchronous messages from other switches, but will ignore them. During this phase, ci remains the only master and processes all messages from the switch guaranteeing liveness and safety.

Phase 2. Insert and remove a dummy flow. To determine an exact instant for the migration, ci sends a dummy Flow- Mod command to smto add a new flow table entry that does not match any incoming packets. We assume that all controllers know this dummy flow entry a priori as part of the protocol. Then, it sends another Flow-Mod command to delete this entry. In response, the switch sends a Flow- Removed message to both controllers since cjis in the equal mode. This Flow-Removed event provides a time point to transfer the ownership of switch smfrom cito cj, after which only cj will process all messages transmitted by sm. An additional barrier message is required after the insertion of the dummy flow and before the dummy flow is deleted to prevent any chance of processing the delete message before the insert. Note that we do not assume that the Flow- Removed message is received by ci and cj simultaneously, since we assume that the message order is consistent between ci and cj after these controllers enter the equal mode, meaning that all messages before Flow-Removed will be processed by ciand after this will be processed by cj.

Phase 3. Flush pending requests with a barrier. While cj

has assumed the ownership of smin the previous phase, the protocol is not complete unless ciis detached from sm. How- ever, it cannot just be detached immediately from smsince there may be pending requests at ci that arrives before the Flow-Removed message. This appears easily since we assume the same ordering at ci and cj. So all cineeds to do is processing all messages arrived before Flow-Removed, and committing to sm. However, there is no explicit acknowl- edgment from the switch that these messages are committed. Thus, in order to guarantee all these messages are committed, ci transmits a Barrier-Request and waits for the Barrier-Reply, only after which it signals end migration to the final master cj.

Phase 4. Assign controller cjas the final master of sm. cj

sets its role as the master of sm by sending a Role-Request message to sm. It also updates the distributed data store to indicate this. The switch sets cito slave when it receives the Role-Request message from cj. Then cj remains active and processes all messages from smfor this phase.

The above migration protocol requires six round-trip times to complete the migration. But note that we need to trigger migration only once in a while when the load conditions change, as we discussed in the algorithm design subsections.

5 P

ERFORMANCE

E

VALUATION

In this section, we evaluate the performance of our centralized and distributed protocols. We consider the case where traffic demand changes and examine whether the metric of balanced workload controllers is minimized.

We also take the number of migrated switches into con- sideration. Furthermore, we check how different parameters will influence the results.

5.1 Environment Setup

We construct simulations by Python 2.7 to evaluate the performance of our designs. We place 10,000 switches and 100 controllers in a 100 100 m² square. Switches are evenly distributed in this square, say, each switch is 1 m away from any of its neighbors. The controllers are also evenly distributed and each controller is 10 m away from its neighbor. Each controller can control all the switches within 30 m, and can communicate with other controllers within the range of 40 m. We assume the weight of each switch follows Pareto distribution with its parameter ap¼ 3. We build a small simulation to choose the most appropriate a, b and g, so that the environment we build can be very close to the real situation, in terms of the traffic condition, workload of controllers, and migration frequency, etc. [13], [14], [15], [24]. Thus we set a ¼ 0:7; b ¼ 1:5; g ¼ 1:3 as default configuration.

5.2 System Performance Visualization Results We use the default configuration described above to test the performance of the system. We first apply initialization and change the traffic demands dynamically to emulate unpre- dictable user requests. Then we apply naive LBDC-CM and other variants to alleviate the spot congestion. We use relative weight deviation to evaluate the performance of our algorithms.

We examine the performance of our four algorithms.

Consider a DCN with 10 10 controllers locating as an square array. At the beginning of a time slot, the weights of switches are updated and then we run the migration algorithms. The weight of switches follows Pareto distribution with ap¼ 3. Fig. 3 indicates the system initial traffic states, Different color scale represents different working state of a controller. The darker the color is, the busier the controller works. Figs. 4, 5, 6 and 7 illustrate the performance of the naive LBDC-CM, limited LBDC-CM, Priori LBDC-CM and LBDC-DM respectively. We can see that after the migration, the whole system becomes more balanced.

Actually, the performance of LBDC-DM is poor when the number of the controllers is relatively limited. This phenomenon is attributed to the system setting that one controller can only cover switches within 30 m. When the number of controllers is few, more switches should be controlled by one particular controller without many choices. As the

Fig. 3. Initial state.

(10)

number of the controller increases, LBDC-DM can achieve a better performance and a higher improvement ratio.

Intuitively, increasing the number of controllers may increase the deviation, but it may lead to less migration frequency. To balance the number of controllers and the migration frequency, we need to carefully set a, b, and g values. If there are sufficient controllers to manage the whole system, we can adjust the three parameters such that the system will maintain a stable state longer. While if the number of controller reduces, we have to raise the migration threshold to fully utilize controllers. The effect on these parameters are further discussed in Section 5.4.

5.3 Horizontal Protocol Performance Comparison We designed three variations of LBDC-CM: naive LBDC- CM, which is the simplest and applicable to most of the cases. While if the controllers are heterogeneous or the switches have a space priority to its closest controller physically, then we can implement limited LBDC-CM or LDBC- CM with switch priority respectively. Finally we have a distributed LBDC-DM protocol. Now let us compare the performance of the four migration protocols.

Comparison on Number of Controllers. First, we vary the number of controllers from 30 to 210 with a step of 20 and check the change of relative weight deviation of the system.

The simulation results are shown in Figs. 8 and 9. We compare the relative weight deviation of the initial bursty traffic state and the state after the migration. We find that after the migration, the relative weight deviation of all the controllers decreases. It depicts that our four protocols improve the

Fig. 4. Naive LBDC-CM migration.

Fig. 5. Limited LBDC-CM migration.

Fig. 6. Priori LBDC-CM migration.

Fig. 7. LBDC-DM migration.

Fig. 8. Relative weight deviation protocol comparison.

Fig. 9. Performance improvement for different protocols.

(11)

system performance significantly compared with the initial state, whether in the relative weight deviation part or in the improvement part. As the number of controllers increases, the improvement ratio is also increasing. It is quite intuitive that more controllers will share jobs to reach a balanced state. Both figures show that our algorithm has a pretty good performance when the number of controllers grows, which indicates that our scheme is suitable for mega data centers.

The naive LBDC-CM performs the best because it con- siders all possible migrations from a global prospective.

It is even better than the performance of the LBDC-DM, but the difference between them is decreasing as the number of the controllers increases. It is better if we add more controllers to the network to achieve a balanced traffic load. In reality we may only run the other protocols such as the LBDC-DM, limited LBDC-CM and LBDC-CM with switch priority. For the limited LBDC- CM, the maximum workload of controllers also follows Pareto distribution with ap¼ 3, and we amplify it with a constant to make sure the total traffic load not exceed the capacity of all controllers. For LBDC-CM with switch priority, we allocate a value to each mapping of a switch and a controller, which is inversely proportional to their distance, we can also see that it has a significant growth as the number of controller increases. Overall we can conclude that all of the four protocols performs quite well in balancing the workload of the entire system.

Run-Time Performance w.r.t Static Traffic Loads. Figs. 10 and 11 show the relative weight deviation and migrated switch number w.r.t. the four protocols at different time slot under the condition that the global traffic load is not changed all the time (the weight of each switch is constant).

We can see that the relative weight deviation is decreasing, but the values of limited LBDC-CM and LBDC-CM with

switch priority are higher than that of the naive LBDC-CM.

This is because through limited LBDC-CM and LBDC-CM with switch priority, each controller has a different upper bound, which will influence the migration. For example, if some switches can only be monitored by a certain controller, and that controller is overloaded, then it will cause a high relative weight deviation since we cannot remove the switches to other controllers. In addition, controllers in LBDC-CM with switch priority even have a preference when choosing potential switches. In terms of migrated switch numbers, we can see that with time goes by, all four protocols remain stable on the number of migrated switches. LBDC-DM has the lowest number of migrated switches because of its controllers can only obtain a local traffic situation, resulting in the relatively low frequency in migrating switches.

Run-Time Performance w.r.t. Dynamic Traffic Loads. Figs. 12 and 13 show the relative weight deviation and migrated switch number w.r.t. the four protocols at different time slot under the condition that the global traffic load is changed dynamically (the weight of each switch is dynamic). Even if the traffic load is changing at different time slots, the migrated switch number stays in a relatively stable status. If controller c1 is overloaded, it will release some dominating switches to its nearby controllers. However, if in the next round, the switches that monitored by those controllers gain higher traffic load and make the nearby controllers overloaded, then the switches may be sent back to controller c1. Thus, to avoid such frequent swapping phenomenon, we can set an additional parameter for each switch. If its role has been changed in the previous slot, then it will be stable at current state.

We may also consider the deviation of load balancing among switches to better improve the system performance. Since we consider the balancing problem among

Fig. 10. Relative weight deviation without traffic changes.

Fig. 11. Migrated switch without traffic changes.

Fig. 12. Relative weight deviation as traffic changes.

Fig. 13. Migrated switch as traffic changes.

(12)

controllers, which is like the “higher level” of balancing problem among switches, we can implement some load balancing strategies among switches [25], [26], [27], [28]

and combine the two-layers together to achieve a better solution.

5.4 Parameter Specification

Next we explore the impact of the threshold parameters a; b; g. Here a is a parameter to balance conservativeness and radicalness, b is a crucial parameter which decides whether to migrate switches or not in four protocols, and g is used in Step 4 of LBDC-CM. We examine the impact of changing a, b and g altogether. Table 2 lists the statistics for a ranging between 0.25 and 0.75, b ranging between 1.15 and 1.35, g ranging between 1.15 and 1.35. The improvement rate and the number of migrated switches is mostly decreasing as b increases, which is actually correct according to the definition of the threshold.

6 R

ELATED

W

ORK

As data center becomes more important in industries, there have been tremendous interests in designing efficient DCNs [1], [2], [29], [30], [31], [32]. Also, the effects of traffic engineering have been proposed as one of the most crucial issues in the area of cloud computing.

The existing DCN usually adapts a centralized controller for aggregation, coordination and resource management [1], [2], [10], [31], which can be energy efficient and can leverage the failure of using a global view of traffic to make routing decisions. Actually, using a centralized controller makes the design simpler and sufficient for a fairly large DCN.

However, using a single omniscient controller introduces scalability concerns when the scale of DCN grows dramati- cally. To address these issues, researchers installed multiple controllers across DCN by introducing devolved controllers [4], [5], [6], [7], [8], [33] and used dynamic flow as an example [5] to illustrate the detailed configuration. The introduc- tion of devolved controllers alleviates the scalability issue, but still introduce some additional problems.

Meanwhile, several literatures in devising distributed controllers [6], [7], [8] have been proposed for SDN [34] to address the issues of scalability and reliability, which a centralized controller suffers from. Software-Defined Network- ing is a new network technology that decouples the control plane logic from the data plane and uses a programmable software controller to manage network operation and the state of network components.

The SDN paradigm has emerged over the past few years through several initiatives and standards. The leading SDN protocol in the industry is the OpenFlow protocol. It is spec- ified by the Open Networking Foundation (ONF) [35], which regroups the major network service providers and network manufacturers. The majority of current SDN archi- tectures, OpenFlow-based or vendor-specific, relies on a single or master/slave controllers, which is a physically centralized control plane. Recently, proposals have been made to physically distribute the SDN control plane, either with a hierarchical organization [36] or with a flat organization [7]. These approaches avoid having a SPOF and enable to scale up sharing load among several controllers. In [34], the authors present a distributed NOX-based controllers interwork through extended GMPLS protocols. Hyper- flow [7] is, to our best knowledge, the only work so far also tackling the issue of distributing the OpenFlow control plane for the sake of scalability. In contrast to our approach based on designing a traffic load balancing scheme with well designed migration protocol under the OpenFlow framework, HyperFlow proposes to push (and passively synchronize) all state (controller relevant events) to all controllers. This way, each controller thinks to be the only controller at the cost of requiring minor modifications to applications.

HyperFlow [7], Onix [34], and Devolved Controllers [4]

try to distribute the control plane while maintaining logically centralized using a distributed file system, a distributed hash table and a pre-computation of all possible combinations respectively. These approaches, despite their ability to distribute the SDN control plane, impose a strong requirement: a consistent network-wide view in all the controllers. On the contrary, Kandoo [36] proposes a hierarchical distribution of the controllers based on two layers of controllers. Meanwhile, DevoFlow [37] and DAIM [38] also solve these problems by devolving network control to switches.

In addition, [39] analyzes the trade-off between centralized and distributed control states in SDN, while [40] proposes a method to optimally place a single controller in an SDN network. Authors in [41] also presented a low cost network emulator called Distributed OpenFlow Testbed (DOT), which can emulate large SDN deployments.

Recently, Google has presented their experience with B4 [42], a global SDN deployment interconnecting their data centers. In B4, each site hosts a set of master/slave controllers that are managed by a gateway. The different gate- ways communicate with a logically centralized Traffic Engineering (TE) service to decide on path computations.

Authors in [6] implemented migration protocol on current OpenFlow standard. Thus switch migration become possible and we are able to balance the workload dynamically by presenting the following schemes to overcome the shortcomings as well as improve system performance from many aspects.

7 C

ONCLUSION

With the evolution of data center networks, the usage of a centralized controller has become the bottleneck of the entire system, and the traffic management problem also TABLE 2

Influence of a, b and g Factor

a b g Initial LBDC-CM Rate Switch #

0.25 1.15 1.15 181.87 14.41 92.08 6,344 0.25 1.15 1.35 188.54 16.90 91.04 6,236 0.25 1.35 1.15 193.81 11.83 93.90 6,536 0.25 1.35 1.35 182.76 16.18 91.15 6,224 0.75 1.15 1.15 196.73 12.01 93.90 6,705 0.75 1.15 1.35 187.62 17.29 90.79 6,244 0.75 1.35 1.15 178.77 15.46 91.35 6,305 0.75 1.35 1.35 181.01 14.29 92.11 6,231