Server Placement in the Presence of Competition

(1)

Competition

Pangfeng Liu1_{, Yi-Min Chung}1_{, Jan-Jan Wu}2_{, and Chien-Min Wang}2 1

Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, R.O.C.

2

Institute of Information Science, Academia Sinica, Taipei, Taiwan, R.O.C.

Abstract. This paper addresses the optimization problems of placing

servers in the presence of competition. We place a set of extra servers on a graph to compete with the set of original servers. Our objective is to find the placement that maximizes the benefit, which is defined as the profits from the requests made to the extra servers despite the competition, minus the cost of constructing those extra servers.

We propose an O(|V |3k) time dynamic programming algorithm to ﬁnd

the optimal placement of k extra servers that maximizes the beneﬁt in a tree with|V | nodes. We also propose an O(|V |3_{) time dynamic}

program-ming algorithm for ﬁnding the optimal placement of extra servers that maximizes the beneﬁt, without any constraint on the number of extra servers. For general connected graphs, we prove that the optimization problems are NP-complete. As a result, we present a greedy heuristic for the problems. Experiment results indicate that the greedy heuristic achieves good results, even when compared with the upper bounds found by a linear programming algorithm. The greedy heuristic yields perfor-mances within 15% of the upper bound in the worst case, and within 2% of the same theoretical upper bound on average.

1 Introduction

This paper considers a strategy for setting up servers to compete with existing ones. For example, we assume that there are originally a number of McDonald’s restaurants in a city, but no Kentucky Fried Chicken (KFC) restaurants. Now, if we decide to set up a number of KFC restaurants in the same city, where should we place them? We need to determine the locations for KFC so that they can compete with McDonald’s and maximize their proﬁts. Due to heavy competition among business of similar nature, it is important to choose locations of new servers in the area where the competitors have deployed their servers.

We deﬁne the servers we would like to set up as extra servers, and the existing (competitor) servers as original servers. Thus, in the above example, KFC restau-rants are the extra servers and McDonald’s restaurestau-rants are the original servers. We use a graph to model the locations of the servers and users. A node in the graph represents a geographic location, and an edge represents a path between two locations. Building servers in these locations enables users at a node to C. C´erin and K.-C. Li (Eds.): GPC 2007, LNCS 4459, pp. 124–135, 2007.

c

(2)

request services from the servers. Each edge has a communication cost. The distance between two nodes is the length of the shortest path that connects them. For eﬃciency, We assume that requests from users always go to the nearest server. However, when the shortest distances from a user to the original and ex-tra servers are the same, the user will go to the original server. That is, a user will NORMALLY go to the nearest restaurant, either McDonald’s or KFC; however, if the distances to the two restaurants are the same, the user will go to McDonald’s.

After extra servers have been established, users who previously went to Mc-Donald’s may now go to KFC. We define the benefit of an extra server placement to be the profit derived from user requests made to the server, minus the cost of constructing the server. The cost may vary, depending on the location of the ex-tra server. This paper considers two placement problems related to exex-tra servers, in the presence of competition from original servers.

1. Given the city conﬁguration and a number k, locate k extra servers such that they will earn the most proﬁt;

2. Given the city conﬁguration, locate extra servers such that they earn the most proﬁt, without any constraint on the number of extra servers.

We solve these two problems for a tree graph in O(|V |3k) and O(|V |3) time, respectively. For a general graph, we show that the two problems are intractable (NP-complete) and propose a heuristic to solve them. We also run experiments and compare our results for the heuristic with theoretical upper bounds.

Similar server placement problems, such as replica placement problems [4,3,6,10], p-Medians [5], and facility location problems [8], have been studied in the literature. For example, Kariv and Hakimi [5] formulate the p-median problem as locating p points such that the sum of each node’s weight multiplied by its shortest distance to the p points is minimized. However, the p-median problem they considered does not take the building costs into account, and it minimizes the costs, instead of maximizing the proﬁts. The facility location prob-lem is similar to the p-median probprob-lem, with the additional consideration of the facility’s costs.

Our extra server model differs from the model in [5] because it introduces the concept of competition. Extra servers must compete with original servers for user requests, in order to maximize their profits. The number of extra servers established is controlled by the building costs, which differ from location to location. Our dynamic programming model uses a similar technique to that in [4]. The presence of competition demands innovative proof techniques.

Tamir [9] described a dynamic programming model that solves p-median prob-lems on a tree topology with building and access costs. The algorithm assumes that the cost for a client to request services is an increasing function of the dis-tance between the client and the server. If the beneﬁt function in our model is a decreasing function of the distance between the client and the server, our place-ment problem can be solved by transforming it into a p-median problem, and solving it by the dynamic programming described in [9]. However, the method proposed in this paper can deal with any arbitrary beneﬁt functions, and still obtain the optimal solution for a tree topology.

(3)

The remainder of this paper is organized as follows. Section 2 formally de-scribes our server placement models. In Section 3, we introduce the dynamic programming for ﬁnding the optimal extra server placement in a tree. Section 4 contains the proof that the problems are NP-complete for general graphs and presents a heuristic algorithm to solve them. Section 5 reports the experiment results, and Section 6 contains our conclusions.

2 Problem Formulation

We consider a connected graph G = (V, E), where V is the set of nodes and E is the set of edges. Each edge (u, v)∈ E has a positive integer distance denoted by d(u, v). For any two nodes u, v∈ V , d(u, v) also denotes the distance of the shortest path between them. For ease of representation, we also let d(v, S) =

minu∈Sd(v, u) be the length of the shortest path from v to any node in X ,

whereX ⊆ V.

We consider servers that provide service to nodes in the graph. Every node v must go to the nearest server u for service. If a server is located at node v, then

v will be serviced by that server. To simplify the concept of “the nearest server”,

we assuem that for every node v, its distances to all other nodes are diﬀerent, i.e., d(v, u)= d(v, w) for u = w. As a result the nearest server for every node is

uniquely deﬁned.

By serving a client v, a server node u earns a beneﬁt of b(v, u). Note that the function b can be arbitrary. For example, unlike [9], we do not assume that, for the same client node v, the function value must be monotonic with respect to the distance between v and the server node u.

We assume that there are a number of original serversO ⊆ V in G. In addition to the original server setO, and we would like to add a number of extra servers to G to obtain the maximum beneﬁt. Let c(v) be the cost of building a server at node v∈ V , and X be the set of new servers we would like to add into the system. A node v∈ V goes to either O or X for service - v goes to X for service when d(v,X ) < d(v, O); otherwise (d(v, X ) > d(v, O)), v goes to O for service. Let V_X denote the set of nodes that go toX for service, and V_O = V − V_X be the set of nodes that go toO for service.

We deﬁne the nearest servers N S(v) of v as the server v uses. Consequently

N S(v)∈ O if v ∈ V_O, and N S(v)∈ X if v ∈ V_X. We can now deﬁne the beneﬁt

function of adding the serversX as follows.

B(X ) =

v∈VX

b(v, N S(v))−

v∈X

c(v). (1)

We now deﬁne the problem as follows.

k-Extra-Server Problem. Given an integer k, 1 ≤ k ≤ |V − X|, we want to

ﬁnd the optimal placement of k extra servers such that the beneﬁt function is maximized (Equation (2)).

max

(4)

Extra-Server Problem. We want to place extra servers to maximize the beneﬁt

function, without any constraint on the number of the extra servers. We call this optimization problem the extra-server problem.

3 Finding Extra Server Locations

We present algorithms that utilize global information to solve server placement problems. The use of global information facilitates the optimality of the algo-rithm and the assumption of global information is reasonable since we are dealing with a city or grid conﬁguration and the location of servers are static and can be known completely in advance.

We focus on the case where the graph G = (V, E) is a tree. Let T be the tree and r be the root of T . For each node v∈ V , let Tv be the subtree of T rooted

at v. If v is an internal node, then we use child(v) ={v1, v2, . . . , v_|child(v)|} to denote the children of v. Following the notations in [4], let Tv(i) be the subtree

of T that consists of v and the subtrees rooted at the ﬁrst i children of v, i.e.,

Tv(i)={v} ∪ ∪i_j=1Tvj.

Deﬁnition 1 (Beneﬁt function, B). For nodes v, u ∈ V , an integer k, and

an integer i between 0 and|child(v)|, we deﬁne Bv,u_k,i to be the maximum beneﬁt

derived by placing k extra servers in Tv(i), under the condition that u = N S(v).

Consequently u is either an original server or an extra server.

We now consider the beneﬁt function B_k,iv,uby placingX in Tv(i). We deﬁneX to

be the set of k extra servers that maximize the following beneﬁt function. Recall

thatO is the set of original servers.

B_k,iv,u= max

X { w_∈Tv(i),N S(w)∈X ∪u b(w, N S(w))− s∈X c(s)}, u /∈ O,

B_k,iv,u= max

X { w∈Tv(i),N S(w)∈X b(w, N S(w))− s_∈X c(s)}, u ∈ O.

The deﬁnition indicates that the beneﬁt includes those nodes that will either go to the extra serversX or u (when u /∈ O) for service, minus the construction cost of the extra server setX .

For the case where u is not inO, by deﬁnition u is v’s nearest server, so u has an extra server. However, u can be a node outside of Tv(i), – in which case it

will not be inX because X is a subset of Tv(i). We still need to add the beneﬁt

from Tv(i) to u, since we assume that an extra server is placed in u.

Lemma 1. For every node v ∈ V and every child vi of v, if u ∈ Tvi is the

nearest server to v, then u is also the nearest server to vi.

Proof. We prove this lemma by contradictions and assume that the nearest server

(5)

be strictly smaller than d(vi, u). The length of the shortest path between v and

u is d(v, u)≤ d(v, vi) + d(vi, u) < d(v, vi) + d(vi, u) = d(v, u), which suggests

that u is closer to v than u; however, this contradicts the assumption that u is the nearest server of v.

For ease of discussion of the following lemma, we deﬁne a node set Vv,u,i. This

set contains those nodes in Tvithat could be the nearest server for vi, under the

condition that u is the nearest server for v, but not for vi, i.e., N S(v) = u and

N S(vi)= u. Intuitively, the set Vv,u,istands for those nodes in Tvi that are far

enough from v so that it will not be the nearest server for v (when compared with u), but close enough to vi so that it is the nearest server of vi.

Deﬁnition 2 (Vv,u,i). Let u be the nearest server of v and i be an integer

between 1 and |child(v)|. Vv,u,i is the subset of those u in Tvi such that u is

the nearest server to vi, but it is not the nearest server to v. That is, Vv,u,i =

{u_|u_{∈ T}_v

i, d(vi, u) < d(vi, u), d(v, u)d(v, u)}

Lemma 2. For every node v ∈ V and every child vi of v, if u /∈ Tvi is the

nearest server of v, then either u is the nearest server of vi or there exists a

node u ∈ Vv,u,i that is the nearest server of vi.

Proof. If u is the nearest server of vi, the lemma follows. Otherwise, we conclude

that the nearest server of vimust be within Tvi, since the path from vi to nodes

not in Tvi must pass through v, which already has u as its nearest server. The

lemma then follows by the deﬁnition of Vv,u,i.

Theorem 1. For every node v∈ V and an integer i between 0 and |child(v)|, if

u is the nearest server of v, then for every node w in Tvi, we can ﬁnd the nearest

server for w in Tvi∪ {u}.

Proof. The only way a shortest path from a node w in Tvi to any node outside

Tvi is to go through the edge (vi, v). However, any such shortest path must end

at node u since u is the nearest server for v; otherwise we will be able to ﬁnd a closer server for v other than u – a contradiction to the fact that N S(v) = u.

Terminal Conditions. We ﬁrst derive two terminal conditions for the recursion

of B, the beneﬁt function.

k = 0. When k is 0, we do not place any extra servers in Tv(i). If u is an original

server inO, every node in Tv(i) will go toO for service, so the beneﬁt is 0. If

u is not in O, we consider two cases. First if u is not in Tv(i), every node in

Tv(i) will either go to an original server or to u for service; thus, the beneﬁt

can be determined by Equation (3).

B =

w_∈Tv(i),d(w,u)<d(w,O)

(6)

In the second case, u is not an original server but u is in Tv(i), which means

that there is at least one extra server in Tv(i). This contradicts the assumption

that k is 0. For the purpose of dynamic programming, we deﬁne the beneﬁt to be −∞.

k = 1, u /∈ O, u ∈ T(i)v . When k is 1, u is in T

(i)

v , so it is not an original server,

but it is deﬁnitely the only extra server in Tv(i). Every node in Tv(i)will either

go toO or u for service; thus, the beneﬁt can be calculated in the same way

as B− c(u). Note that, since u is now in the X that maximizes the beneﬁt

of Tv(i), c(u) should be deducted from the beneﬁt.

Recursion. Next, we derive the recursion function for B_k,iv,u.

B_k,iv,u= ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 0, if k = 0 and u∈ O B, if k = 0, u /∈ O, and u /∈ Tv(i)

B− c(u), if k = 1, u /∈ O, and u ∈ Tv(i)

B, if u∈ Tvi max{B, B}, if u /∈ Tvi −∞, otherwise, (4) where B= max 0≤j≤k Bv,u_k_−j,i−1+ Bvi,u j,|child(vi)| , (5) and B= max 0≤j≤k

Bv,u_k_−j,i−1+ E_j,iv,u

. (6)

The ﬁrst three cases were discussed as the terminal conditions in Section 3, so we only need to consider the rest.

u∈ Tvi

If u∈ Tvi, u will also be the nearest server to vi by Lemma 1, since u is the

nearest server of v. Then, by Theorem 1, every node in Tvi goes to either

Tvi or u for service. In addition, u is the nearest server to v. By Theorem 1,

all nodes in Tv(i−1) obtain service from u or Tv(i−1).

Assume that there are j extra servers in Tvi, then there will be k−j extra

servers in Tv(i−1), where 0 ≤ j ≤ k. To obtain the best X that maximizes

the beneﬁt, we need to consider all possible values of j, as formulated in Equation (5). The recursion follows.

u /∈ Tvi

If u is not in Tvi, we need to consider two sub-cases.

Case 1: If u is the nearest server of vi, the value of Bk,iv,u is deﬁned as

in Equation (5), because we can isolate two subtrees, as we did in the previous case where u∈ Tvi.

Case 2: If the nearest server of vi is not u, by Lemma 2, we can ﬁnd the

nearest server u for viin Tvi. We formulate the beneﬁt as Bin

Equa-tion (6).

Consider these two sub-cases, if u /∈ Tvi, B

v,u

(7)

Now, in order to ﬁnish the recursion the only missing element is the new cost function E_k,iv,u.

Deﬁnition 3 (E_k,iv,u). For nodes v, u ∈ V , an integer k, and the i-th child of

node v (denoted by vi), we deﬁne E

v,u

k,i to be the maximum beneﬁt derived by

placing k extra servers in the subtree Tvi, where u /∈ Tvi is the nearest server of

v, but u is not the nearest server of vi. Instead, the nearest server of vi is a u

in Tvi. The beneﬁt is similarly deﬁned in Equation (7):

E_k,iv,u= max

X { w∈T_vi,N S(w)∈X b(w, N S(w))− s∈X c(s)}. (7)

From the above discussion, the maximum beneﬁt E_k,iv,uis derived by Equation (8). That is, we need to enumerate all the possible uand use the one that maximizes

Bvi,u

k,|child(vi)|. The set Vv,u,i is exactly the possible set to select u

_{from, since v}_i

will go to u for service, but not to u. This is exactly the deﬁnition of Vv,u,i.

E_k,iv,u= max

u∈Vv,u,i Bvi,u k,|child(vi)| . (8)

The Final Solution. Finally, the maximum beneﬁt of locating k extra servers in

the tree T can be calculated by Equation (9): max u∈T B_k,r,u_|child(r)| . (9)

The possible candidates for u are subject to the following constraints: If u is an original server d(r, u) must be d(r,O), i.e., u is the nearest original server to the root. If u is not an original server, the distance d(r, u) must be smaller than

d(r,O) to ensure that u is the nearest extra server to the root.

Theorem 2. Given a tree T = (V, E) and a setO ⊆ V as the original servers,

the k-extra-server problem for T can be solved in O(|V |3k) time, where 0≤ k ≤

|V − O| is an integer.

Proof. The problem can be solved by Equations (3) to (9). The time of the

dynamic programming is derived by calculating all the entries of B_k,iv,uand E_k,iv,u. Consider each pair of v and i, so that there are totally _v_∈V |child(v)| = |V |−1 pairs. Thus, the number of entries of B_k,iv,uis (k+1)·|V |·(|V |−1) = O(|V |2k), and

it takes O(|V |) time to calculate each entry; hence, the time required to calculate all the entries of B_k,iv,u is bounded by O(|V |3k). Similarly, there are O(|V |2k)

entries of E_k,iv,u, and it takes O(|V |) time to calculate each entry; therefore, the time required to calculate all the entries of E_k,iv,u is O(|V |3k). The total time

required is therefore O(|V |3k).

Using similar techniques we derive the following theorem. The proof is removed due to space limitation.

(8)

Theorem 3. Given a tree graph T = (V, E) andO ⊆ V are the original servers

of T , the extra-server problem for T can be solved in O(|V |3_{) time.}

Proof. The proof is similar to that of Theorem 2. There are O(|V |2) entries

of Bv,u_i and O(|V |2) entries of E_iv,u, and the calculation of each entry requires at most O(|V |) computing time. Hence, the problem can be solved in O(|V |3) time.

4 NP-Completeness

The NP-complete proof is derived from the dominating set problem [2], and is removed due to space limitation. A subset V⊆ V is a dominating set if for all

u∈ V − V, there is a v ∈ V such that the edge (u, v) is in E. The decision

problem of the dominating set can be formulated as follows: Given a graph G = (V, E) and a positive integer K ≤ |V |, is there a dominating set of size K or less?

k-EXTRA-SERVER. We now consider the k-extra-server problem and deﬁne the

corresponding decision problem as follows: In a k-extra-server problem instance, is there a placement of k extra servers such that the beneﬁt is at least B?

SERVER. Similarly, we deﬁne the decision problem of

EXTRA-SERVER as follows: In a extra-server problem instance, is there a placement of extra servers such that the beneﬁt is at least B?

Theorem 4. The k-EXTRA-SERVER problem is NP-complete. Theorem 5. The EXTRA-SERVER problem is NP-complete.

Since the k-extra-server problem and the extra-server problem are both NP-complete, we propose a greedy heuristic (denoted as Greedy) for these prob-lems. Here, we only describe Greedy for the k-extra server problem because the method for the extra-server problem is very similar.

The greedy method works in rounds. In each round, we locate an extra server that maximizes its benefit. We add the benefit produced by the selected extra server to the total benefit, which was set to 0 initially, and then mark the se-lected server as an original server. We repeat the process until k extra servers are selected.

5 Experiment Results

We conduct simulations to compare performance of Greedy with the linear pro-gramming optimal solutions acquired using GLPK (GNU Linear Propro-gramming

Kit) [7] for the k-extra-server problem. GLPK is a set of routines designed to

solve large-scale linear programming (LP), mixed integer programming (MIP), and other related problems. It is written in ANSI C and organized in the form

(9)

of a library [7]. Let the 0-1 variable Xu and u∈ V denote whether there is an

extra server on u, and let the 0-1 variable Zuv, u ∈ V , v ∈ V denote whether

v is a client of u. The integer programming for the k-extra-server problem is

formulated as follows: maximize u∈(V −X) v_∈V Zuvb(v, u)− u_∈V Xuc(u), (10) subject to

Xu∈ {0, 1}, for each u∈ V , (11a)

Zuv∈ {0, 1}, for each u∈ V , v ∈ V , (11b) Xu= 0, for each u∈ O, (11c) u∈V Xu= k, (11d) u∈V

Zuv= 1, for each v∈ V , (11e)

Xu− Zuv≥ 0, for each u∈ (V − O), each v ∈ V , (11f)

Zuv= 0, for each u∈ V , each v ∈ V , and d(v, u) > d(v, O). (11g)

Consider the 0-1 variables Xu and Zuv in constraints (11a) and (11b)

respec-tively. We replace them with constraints (12a) and (12b) respectively, so that we have a linear programming formulation.

0≤ Xu≤ 1, for each u∈ V , (12a)

0≤ Zuv≤ 1, for each u∈ V , v ∈ V . (12b)

The optimal beneﬁt gained from linear programming only serves as a upper bound, since it allows a fraction number of an extra server to be placed on a node. However, in our experiments, we ﬁnd that, in most cases, linear programming produces integer solutions, i.e., Xu and Zuv are in the range{0, 1}.

5.1 Experiment Setting

In our experiments, we use GT-ITM [1] to generate random graphs according to Waxman model [11]. Each of the graphs is connected, and nodes are added randomly in a s×s square. The probability of an edge between u and v is given by

p(u, v) = αe−d/βL,

where 0 < α, β≤ 1, d is the Euclidean distance between u and v, and L =√2s is the largest possible distance between any two nodes. In our experiments, we set s to 20, α to 0.2 and β to 1.

For each v, we set a value r(v) to be a random integer between 20 and 40, and set the building cost c(v) to be r(v) plus a random integer between 1 and 10. The beneﬁt function b(v, u) is deﬁned as r(v) divided by the distance from

v to u. Finally, we place original servers randomly in the graph. We simulate up

(10)

5.2 Eﬀect of α

We evaluate the performance of Greedy compared with the upper bounds found by linear programming under diﬀerent values of α. In these experiments, for each

α we set|V | from 50 to 150, and for each |V | we set |O| from 0 to 0.1|V |. As a

result, we have 1066 graphs to simulate, and for each graph we set k from 1 to

0.1|V |. Figure 1 shows that when α increases the average degree of each node

also increases. Figure 1 shows that Greedy performs very well; on average, its performance diﬀers from the theoretical upper bounds by only 1% and in the worst case the diﬀerence is no more than 15% of the upper bound.

Figure 1 also shows that as α increases, the average difference between Greedy and the upper bound derived by linear programming also increases. Since the aver-age degree of each node increases as α increases, there is a higher probability that the extra servers will affect each other. However, to maximize the benefit, Greedy only considers the current configuration when it selects the next location to place an extra server; thus, it can not predict the “long range” effects and the interac-tion among the extra servers. Hence, as α increases, the average difference (as a percentage) between Greedy and the upper bound also increases.

|V | α = 0.2 α = 0.3 α = 0.4 α = 0.5

50 3.56 5.37 7.11 8.78

150 10.45 15.59 20.87 26.12 Average 8.11 12.01 16.03 20.03

α Avg. diﬀerence Max. diﬀerence

0.2 0.43% 9.54%

0.3 0.49% 14.35%

0.4 0.52% 13.20%

0.5 0.58% 11.95%

Fig. 1. The average degree of a node under diﬀerent values of α and the average

diﬀerence (as a percentage) between Greedy and the upper bound under diﬀerent values of α

5.3 Eﬀect of the Number of Original Servers

We now consider the eﬀect of the number of original servers on the average dif-ference as a percentage of the upper bounds. In these experiments we set|V | to

100,|O| from 1 to 50, and k to 10.

Figure 2 (a) shows the error-bar between Greedy and the upper bounds derived by the linear programming. The upper markers are the average upper bounds and the lower markers are the average benefits of Greedy. In the figure, the average benefits produced by Greedy are so close to the upper bounds that they coincide. Furthermore, the figure suggests that as|O| increases the benefit will decrease. This is reasonable since a large number of competitors only have a negative impacts on the extra servers.

5.4 Eﬀect of k

Next, we consider the eﬀects of k on the average diﬀerence as a percentage be-tween Greedy and the theoretical upper bound. In these experiments we set

(11)

100 200 300 400 500 600 700 800 900 0 5 10 15 20 25 30 35 40 45 50 Average benefit

Number of original servers

(a) The benefits of Greedy and the average upper bounds under different numbers of original servers.

100 200 300 400 500 600 700 800 0 5 10 15 20 25 30 35 40 45 50 Average benefit k

(b) The average benefits of Greedy and the upper bounds under different values of k.

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 5 10 15 20 25 30 35 40 45 50

Average difference percentage [%]

k

(c) The average percentage difference for Greedy under different values of k.

Fig. 2. Average beneﬁts under diﬀerent number of original and extra servers ((a) and

(b)), and derivation percentage from the theoretical bounds (c)

|V | to 100 and |O| to 10, so we generate 100 graphs in total. For each graph we

set k from 1 to 50, which gives us 5000 simulation results.

Figure 2 (b) shows the error-bars in our simulations. We observe that the ben-efit of Greedy is extremely close to the theoretical upper bounds. The figure also shows that, initially, as k increases, the benefit increases because we can make more profit. As the number of extra servers increases substantially, the benefit decreases due to the cost of constructing the extra servers.

Figure 2(c) shows that as k increases the average difference between Greedy and the theoretical upper bound also increases. This is because Greedy places an extra server to maximize the benefit at each step because it can not consider the overall situation; thus, the difference accumulates at each step – more servers means a larger difference between Greedy and the upper bound.

In summary, we conclude that the Greedy algorithm performs extremely well. Considering all the simulation parameter setting, the greedy algorithm yields av-erage beneﬁts that are within 2% of the avav-erage theoretical upper bounds. It is also extremely eﬃcient and easy to implement.

6 Conclusion

We have formulated two optimization problems, the k-extra-server problem and the extra-server problem. We consider the proﬁt and construction costs at each location, and place extra servers to maximize the beneﬁt in the presence of

(12)

competition from original servers. For trees, we formulate dynamic programming algorithms to solve the k-extra-server problem and the extra-server problem in

O(|V |3_{k) time and O(}_{|V |}3_{) time, respectively. For general graphs, we prove that}

the problems are NP-complete and propose a greedy heuristic to solve them. The experiment results demonstrate that the greedy heuristic yields performances within 15% of the theoretical upper bound in the worst case, and within 2% of the same theoretical upper bound on average.

In the future we will investigate the possibility of designing eﬃcient and eﬀec-tive algorithms for graphs other than trees. For example, our greedy algorithms perform well on general graphs, so we should be able to show that the greedy algorithm performance is guaranteed to be within a constant factor of the opti-mum. We would also like to generalize dynamic programming to other graphs, such as planar graphs.

References

1. K. Calvert and E. Zegura. Gt-itm: Georgia tech internetwork topology models. http://www-static.cc.gatech.edu/projects/gtitm/.

2. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the

Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA, 1979.

3. X. Jia, D. Li, X. Hu, W. Wu, and D. Du. Placement of web-server proxies with consideration of read and update operations on the internet. The Computer Journal, 46(4):378–390, 2003.

4. K. Kalpakis, K. Dasgupta, and O. Wolfson. Optimal placement of replicas in trees with read, write, and storage costs. IEEE Transactions on Parallel and Distributed Systems, 12(6):628–637, June 2001.

5. O. Kariv and S. L. Hakimi. An algorithmic approach to network location problems. ii: The p-medians. SIAM J. Appl. Math., 37(3):539–560, 1979.

6. B.-J. Ko and D. Rubenstein. A greedy approach to replicated content placement using graph coloring. In SPIE ITCom Conference on Scalability and Traﬃc

Control in IP Networks II, Boston, MA, July 2002.

7. A. Makhorin. http://www.gnu.org/software/glpk/glpk.html.

8. D. B. Shmoys, E. Tardos, and K. Aardal. Approximation algorithms for facility location problems (extended abstract). In Proc. 29th ACM STOC., pages 265–274, 1997.

9. A. Tamir. An o(pn2) algorithm for the p-median and related problems on tree graphs. Operations Research Letters, 19(2):59–64, 1996.

10. O. Unger and I. Cidon. Optimal content location in multicast based overlay networks with content updates. World Wide Web, 7(3):315–336, 2004.