Clique-Distance-Approx - 在大型網路圖上計算完全子圖之最遠距離

Input the graph G and the given cliques list C.

Output CD storing the approximate clique distances.

1 r ← |C|, n ← |V (G)|

2 Initialize CD as a r × r matrix with each element has the value ∞ 3 Initialize D as n × n matrix with each element has the value ∞ 4 Initialize Π as n × n matrix with each element has the value NIL 5 for C_i ∈ C

The algorithm BFS-Revised maintains a newly added variable BRANCH . Consider a BFS tree T computed by BFS-Revised. Every node except s must have a parent node on T. If the operation BFS-Revised starts from a node s in the clique C_s, then all of neighbor of s in the clique C_s are at the level one of the tree T (The node s is at level zero.) For every node w, the variable BRANCH [w]

refers the ancestor at the level one if s reaches w via a neighbor also in the clique C_s, or refers to s if s reaches w without passing through any neighbor belonging to Cs.

If the variable v = BRANCH [w] 6= s, we may reached a clique from s and the shortest path passes through one of neighbor of s in the clique C_s. By Lemma 3.4, the distance D[v, w] is the clique distance. Otherwise we compare the distance D [s, w] with the currently known clique distances estimation.

Note that Algorithm 3.2 computes only approximate results. To see how it failed to compute exact results, we provide the graphs showing in Figure 3.2 and

Figure 3.3: The Failed BFS-Tree Rooted at a

Figure 3.3 as examples. The correct clique distance cd(C₁, C₃), which is showing in Figure 3.2, is d(b, j) = 4. If the BFS-Revised starting from node a expands the children of e prior the expansion of b, BFS-Revised(a) reaches C3 by node j without taking the correct clique shortest path. In this particular case Algorithm 3.2 return an approximate estimate with an additive error of one. Because Algorithm 3.2 expands nodes in arbitrary order, the failed cases are not preventable. So we conclude that Algorithm 3.2 computes the approximate clique distances with the additive error at most one.

Now we prove the correctness of our algorithms. Algorithm 3.3 is just a trivial loop. Algorithm 3.2 is a modified version of Breadth-First-Search. It traverses every node and computes the shortest path from starting node to every node in graph G. If a newly found clique is reached via a neighbor in the same clique of starting node, by Lemma 3.3 and Lemma 3.4, the correct clique distance is reported.

Otherwise, we can not filter out the error cases.

The procedure BFS-Revised runs in the time of O(m + n). The time com-plexity of Algorithm 3.3 is O(r(m + n)) since it perform at most r times of BFS-Revised, where r is the number of cliques in C. Our algorithm performs better than the straightforward algorithm if the number of r is much smaller than the node number n.

Chapter 4 On Transformation to All-Pairs Shortest Paths Problem

In this chapter a technique which transforms any instance of clique distances prob-lem into an instance of APSP probprob-lem is reported. Transforming the clique distances problem into APSP problem is beneficial since we may utilize the algorithms estab-lished for APSP problem to solve the clique distance problem. However, we can not solve APSP problem by solving clique distance problem. By solving clique distance problem it is possible to obtain only a partial solution of APSP problem.

4.1 A Failed Attempt

Before introducing the final version construction, we demonstrate a failed attempt.

Definition 4.1 (Pitfall roof node). For each clique C_i = {u₁, u₂, u₃, . . . , u_k−1, u_k} in C, a newly created roof node v_c_i is inserted into V (G). A roof nodes v_c_i are only adjacent to those nodes belong to the corresponding clique C_i and is disconnected to the rest of nodes (also disconnected from any other roof nodes). Then we put edges (v_c_i, u) for u ∈ C_i where these edges are weighted in zero.

However, this design has a pitfall. Assume v_c_i is a roof node of clique C_i.

u₁ u₂

There must be a path P(x, v_c_i, y) with its length smaller than any edges belonging to C_i. Then the shortest path found by APSP algorithms is the path including P(x, v_c_i, y). The newly found shortest path passes through the roof node v_c_i and is not a possible path on the originally graph. Therefore, the transformation can not fulfill our objectives.

4.2 Correction

So we must give weighed to the edges adjacent to the roof nodes.

Definition 4.2 (Roof node). The construction steps remain unchanged. Instead of creating edges weighted in zero, a heavier edge weighted is given. We put edges (v_c_i, u_j) for u_j ∈ C_i. Those edges are weighted in (w_max + 1) where w_max = max_e∈E(G)w(e). The set R = {v_c_i|C_i ∈ C} contains all roof nodes on G.

Actually we may adopt smaller edge weights. Given a clique C_i and the cor-responding roof nodes v_c_i. Let the maximum edge weight among the edges in C_i is w_c_i. Then the edges weight for those edges adjacent to the roof nodes are at least

1+w_ci

2 since this is enough to prevent roof nodes from being included by any shortest paths.

Lemma 4.1. For all of pairs of clique C_i and C_j and the corresponding roof nodes vci and vcj, we have the length d(vci, vcj)−2×(wmax+1) as the desired clique distance cd(C, C ).

Proof. By the definition 4.2 the roof nodes are constructed and inserted into G.

After all the roof nodes are created, these operations result in an newly created graph G⁰. Because the negative edge weight is not allowed, it is possible to compute the all-pairs shortest path on G⁰ by any algorithms solving APSP problem.

Assume the shortest path between v_c_i and v_c_j as P(v_c_i, s, . . . , t, v_c_j), and the node s and t belong to C_i and C_j. It must be the cases because any roof node is connected to every node in the corresponding clique only and is isolated from the rest of nodes in originally graph G. So the edge (v_c_i, s) must be taken. The edge (t, v_c_j) is also taken for the same reason.

We argue that the path P(s . . . t) is the clique shortest path between C_i and C_j for the reason that the path P(v_c_i, s, . . . , t, v_c_j) is the shortest path between v_c_i and v_c_j. If the path between s and t are not the one with the minimal length, the shortest path algorithm must find some different node s⁰ 6= s or t⁰ 6= t where d(s⁰, t⁰) < d(s, t).

This contradicts our assumption. Since the cost of (v_c_i, s) and (t, v_c_j) are both w_max+ 1 clearly the path length of cd(C_i, C_j) = d(v_c_i, v_c_j) − 2 × (w_max+ 1).

After doing the transformation, we may compute the clique distances by com-puting the all-pairs shortest path among all of the roof nodes. Algorithm 4.2 illus-trates the whole algorithm.

Algorithm 4.1 Clique-Distances-Roof-Nodes

在文檔中在大型網路圖上計算完全子圖之最遠距離 (頁 28-32)