Algorithm Design - Social Group Query - 於社群網路中之高效能鏈結預測與群組查詢

3.3 Social Group Query

3.3.2 Algorithm Design

In this section, we propose a novel algorithm, namely SGSelect, to solve SGQ effi-ciently. Our idea is to first derive a feasible graph G_F = (V_F, E_F) from G based on our observation on the social radius constraint, such that there exists a path with at most s edges from q to each vertex in GF. Starting from q, we iteratively explore GF to derive the optimal solution. At each iteration, we keep track of the set of vertices that satisfy the acquaintance constraint as the intermediate solution obtained so far (denoted as V_S).

Initially, we set V_S = {q}, and let VA denote the set of remaining vertices in V_F, i.e., V_A= V_F − VS. We select a vertex in V_Aand examine whether it is feasible (i.e., follow-ing the acquaintance constraint) to move this vertex to V_Sat each iteration, until V_Shas p vertices and the process stops.

The selection of a vertex from VA at each iteration is critical to the performance of query processing. It is essential to avoid choosing a vertex v that may significantly in-crease the total social distance or lead to the violation of the acquaintance constraint. We observe that the access order of nodes in constructing candidate groups is a key factor to the overall performance. It is important to take a priority to consider nodes that are very likely to be included in the final answer group, i.e., the optimal solution. This may facilitate effective early pruning of unqualified solutions. Additionally, social radius and acquaintance constraints can be exploited to facilitate efficient pruning of vertices which would not lead to the eventual answer. We summarize our ideas as follows.

Access ordering. To guide an efficient exploration of the solution space, we access vertices in V_A following an order that incorporates (1) the increment of the total social distance and (2) the feasibility for the acquaintance constraint. Accordingly, we define the notion of interior unfamiliarity and exterior expansibility of V_S to test the feasibility of examined vertices to the acquaintance constraint during the vertex selection.

Distance pruning. To avoid exploring vertices in V_Athat do not lead to a better solu-tion in terms of total social distance, Algorithm SGSelect keeps track of the best feasible solution obtained so far and leverages its total social distance to prune redundant

exami-nations of certain search space.

Acquaintance pruning. We explore the properties of the acquaintance constraint to facilitate search space pruning. Specifically, we define the notion of inner degree of the vertices in VAand derive its lower bound, such that a feasible solution can be derived from vertices in VSand VA. The lower bound is designed to detect the stop condition when there exists no feasible solution after including any vertex in V_A.

To find the optimal solution, Algorithm SGSelect may incur an exponential time in query processing because SGQ is NP-hard. In the worst case, all candidate groups may need to be considered. However, by employing the above pruning strategies, the average running time of Algorithm SGSelect can be effectively reduced, as to be shown in Section 3.5. In the following, we present the details of the proposed algorithm.

Radius Graph Extraction

Obviously, the social radius constraint can effectively prune redundant candidates in the social network of the activity initiator. Thus, Algorithm SGSelect first extracts the vertices that satisfy the social radius constraint. A simple approach to meet the social radius constraint is to find the minimum-edge path (i.e., the shortest path with the minimum number of edges) between q and every other vertex, and then remove those vertices that have their minimum-edge paths longer than s edges. Nevertheless, the minimum-distance path with at most s edges and the minimum-edge path can be different. As a result, the total distance of the minimum-edge path may not be the minimum distance. Moreover, the minimum-distance path may consist of more than s edges which does not satisfy the social radius constraint. To address the above problem, we define the notion of i-edge minimum distance, which represents the total distance of the minimum-distance path with no more than i edges as follows.

Definition 3.3.1. The i-edge minimum distance between the vertex v and the vertex q is

dⁱ_v,q = min

u∈N^v

{dⁱ_v,q⁻¹, dⁱ_u,q⁻¹+ c_u,v} ,

where N_v is the set of neighboring vertices of v.

Based on dynamic programming, Algorithm SGSelect computes the i-edge minimum distance between the vertex v and the vertex q by iteratively deriving dⁱ_v,qin terms of dⁱ_u,q⁻¹ of each neighboring vertex u, for 1≤ i ≤ s. Initially, we set d⁰v,q as∞ for every vertex v, v ̸= q. We set d⁰q,q as 0 and derive d¹_v,q for every vertex v in N_q. At the next iteration, we update d²_v,q for v if there exists a neighbor u of v such that d¹_u,q+ c_u,v is smaller than d¹_v,q. This case indicates that there is an alternate path from v to q via a neighbor u, and the path has a smaller total distance. Our algorithm iterates at most s times for each vertex.

Therefore, each vertex v with d^s_v,q <∞ is extracted in our algorithm to construct a feasible graph G_F = (V_F, E_F). In the graph, the total distance of the minimum-distance path with at most s edges (i.e., d^s_v,q) is adopted as the social distance between v and q (i.e., dv,q).

In other words, we ensure that every vertex in VF satisfies the social radius constraint.

Therefore, we consider G_F in evaluating the SGQ for the rest of this chapter.

Access Ordering

After constructing the feasible graph G_F, Algorithm SGSelect iteratively explores G_F to find the optimal solution. Initially, the intermediate solution set V_Sincludes only q, and the remaining vertex set V_Ais V_F− {q}. At each iteration afterwards, we select and move a vertex from V_Ato V_S in order to expand the intermediate solution in V_S. Therefore, V_S represents a feasible solution when|VS| = p and the vertices in VSsatisfy the acquaintance constraint. Next, our algorithm improves the feasible solution by backtracking the above exploration procedure to previous iterations and choosing an alternative vertex in V_Ato expand V_S. A branch-and-bound tree is maintained to record the exploration history for backtracking. This process continues until V_S has p vertices.

To reduce the running time and search space, the selection of a vertex at each iteration is critical. Naturally, we would like to include a vertex that minimizes the increment of the total social distance. Nevertheless, the connectivity of the selected vertex imposes additional requirements for satisfying the acquaintance constraint. Thus, we introduce the

notion of interior unfamiliarity and exterior expansibility with respect to the intermediate solution set V_S to exploit the acquaintance constraint in query processing.

Definition 3.3.2. The interior unfamiliarity of V_S is

U (V_S) = max

v∈VS

|VS− {v} − Nv| ,

where N_v is the set of neighboring vertices of v in G_F. The set V_S − {v} − Nv refers to the set of non-neighboring vertices of v in VS.

The interior unfamiliarity describes the connectivity within the intermediate solution set, and a smaller interior unfamiliarity means the vertices in V_S are more densely con-nected. As shown later, the interior unfamiliarity of possible intermediate solution sets are taken into account in deciding which vertex is to be included in the process of generating the candidate groups. It is preferable to first include a well-connected vertex that results in the intermediate solution set with low interior unfamiliarity since it may make selections of other vertices in the later iterations easier. Next, we define the exterior expansibility of an intermediate solution set V_S, denoted by A(V_S), as the maximum number of vertices that V_S can be expanded from.

Definition 3.3.3. The exterior expansibility of V_S is

A(VS) = min

v∈VS

{|VA∩ Nv| + (k − |VS− {v} − Nv|)}, (3.1)

where the first set (i.e., V_A∩ Nv) contains the neighboring vertices of v in V_A and the second set (i.e., V_S− {v} − Nv) contains the non-neighboring vertices of v in V_S.

The exterior expansibility counts the number of options when selecting vertices from VA to expand VS, and a larger exterior expansibility means the acquaintance constraint is easier to follow during the vertex selection. Specifically, since the number of ex-isting non-neighboring vertices of v in V_S is |VS− {v} − Nv|, we can select at most k − |VS− {v} − Nv| extra non-neighboring vertices of v from VA to expand V_S; oth-erwise, vertex v would have more than k non-neighboring vertices in V_S and violate the

acquaintance constraint. Therefore, for a vertex v in V_S, there are at most|VA∩ Nv| neigh-boring vertices and k− |VS− {v} − Nv| non-neighboring vertices to be selected from VA

in order to expand V_S.

When selecting a vertex v to expand VS, we consider both the increment of the total social distance caused by v and the connectivity of vertices in the new intermediate solu-tion set containing v, which is captured by U (V_S∪ {v}) and A(VS∪ {v}). Specifically, Algorithm SGSelect chooses the vertex v with the minimum social distance to q that sat-isfies the following two conditions for interior unfamiliarity and exterior expansibility, respectively.

Interior Unfamiliarity Condition. The first condition considers the interior un-familiarity. Note that a small value of interior unfamiliarity indicates that every vertex v ∈ VShas plenty of neighboring vertices in VS, i.e., the current intermediate solution set V_S is likely to be expanded into feasible solutions satisfying the acquaintance constraint.

Based on this observation, we employ the interior unfamiliarity condition, i.e.,

U (V_S∪ {v}) ≤ k

[|VS∪ {v}|

]θ

where θ≥ 0 and^|V^S^∪{v}|_p is the proportion of attendees that have been considered, to ensure the value of interior unfamiliarity remaining small when the vertex v is selected. Note that the right-hand-side (RHS) of the inequality reaches its maximum, i.e., k, when θ is fixed as 0. With θ = 0, it is flexible to find a vertex v with a small social distance. However, if a vertex v resulting in U (V_S∪ {v}) = k is selected, the vertex with k non-neighboring vertices in the set V_S ∪ {v} is required to connect to all the vertices chosen from VAat later iterations. Thus, the feasibility of selecting other qualified vertices in later iterations is thereby decreased. In contrast, a larger θ allows SGSelect to choose a vertex from V_A that connects to more vertices in V_S to ensure the feasibility at later iterations. Note that the RHS of the condition increases when V_S includes more vertices. On the other hand, the algorithm reduces θ if there exists no vertex in V_Athat can satisfy the above condition.

When θ decreases to 0 and the above condition still does not hold, i.e., U (V_S∪ {v}) > k,

Algorithm SGSelect stops expanding the new intermediate solution set V_S = V_S ∪ {v}, because adding any vertex from V_Ato this new intermediate solution set does not generate a feasible solution, as shown by the following lemma.

Lemma 3.3.1. Given that

U (V_S) > k, (3.2)

there must exist at least one vertex v in V_S such that v cannot follow the acquaintance constraint for every possible selection of vertices from V_A.

Proof. If U (V_S) > k, then we can find at least one vertex v in V_S such that|VS− {v} − N_v| > k, which means v is already unacquainted with more than k other vertices in VS. Please note that adding any vertex from V_Ato V_Scannot increase the connectivity between the existing vertices in VS. When we expand VS by adding any vertex u from VA, the number of vertices that are unacquainted with v will remain unchanged if u and v are connected, but increase by one otherwise. Therefore, after the expansion, the number of vertices in V_S that are unacquainted with v must still exceed k. That is, v still violates the acquaintance constraint, and hence V_S cannot form a feasible solution. The lemma follows.

Exterior Expansibility Condition. Now we discuss the second condition based on the exterior expansibility, which represents the maximum number of vertices in V_Athat can be considered for expanding the intermediate solution set V_S, and this value must be no smaller than the number of attendees required to be added later, i.e., p−|VS|. Therefore, SGSelect chooses the vertex v from V_Athat can satisfy the exterior expansibility condition, i.e.,

A (V_S∪ {v}) ≥ (p − |VS∪ {v}|) .

If the inequality does not hold, the new intermediate solution set obtained by adding v is not expansible, as shown by the following lemma.

Lemma 3.3.2. Given that

A (V_S) < (p− |VS|) , (3.3)

there must exist at least one vertex v in V_S such that v cannot follow the acquaintance constraint for every possible selection of vertices from V_A.

Proof. If A (V_S) < (p− |VS|), then we can find at least one vertex v in VS

such that |VA∩ Nv| + (k − |VS− {v} − Nv|) < (p − |VS|). In other words, (k− |VS− {v} − Nv|) < (p − |VS|)−|VA∩ Nv|. As mentioned above, |VS− {v} − Nv| is the number of non-neighboring vertices for v, and k− |VS − {v} − Nv| thereby repre-sents the ”quota” for v to choose non-neighbor vertices from V_A. For any possible selection Vb_A⊆ VA, let bλ_Adenote the number of neighbor vertex of v in bV_A. Since bλ_A≤ |VA∩ Nv|, (p− |VS|) − |VA∩ Nv| ≤ (p − |VS|) − bλA. Therefore, if A (V_S) < (p− |VS|), then (k− |VS− {v} − Nv|) < (p − |VS|) − bλA, and v does not have enough quota to support VbAfor satisfying the acquaintance constraint. The lemma follows.

Distance and Acquaintance Pruning

In the following, we further exploit two pruning strategies to reduce the search space.

Our algorithm aims to obtain a feasible solution early since the total social distance of this solution can be used for pruning redundant candidates. At each iteration, the following distance pruning strategy avoids exploring the vertices in the remaining vertex set V_Aif they do not lead to a solution with a smaller total social distance.

Lemma 3.3.3. The distance pruning strategy stops selecting a vertex from V_Ato V_S if

D− ∑

v∈VS

d_v,q < (p− |VS|) min

v∈VA

d_v,q, (3.4)

where D is the total social distance of the best feasible solution obtained so far. The distance pruning strategy can prune the search space with no better solution.

Proof. If the above condition holds, it is impossible to find an improved solution by ex-ploring V_A, since the total social distance of any new solution must exceed D when we select p− |VS| vertices from VA. Algorithm SGSelect considers min_v_∈V_Ad_v,q in distance pruning to avoid sorting the distances of all vertices in V_A, which requires additional

com-putation and may not be scalable for a large social network. Please note that as the best obtained solution improves at later iterations, we are able to derive a smaller upper bound in the LHS, and thus prune a larger search space with distance pruning. The lemma fol-lows.

In addition to pruning the search space that does not lead to a smaller total social distance, we also propose an acquaintance pruning strategy that considers the feasibility of selecting vertices from V_A, and stops exploring V_A if there exists no solution that can satisfy the acquaintance constraint. Earlier in this section, the interior unfamiliarity and the exterior expansibility consider the connectivity between the vertices in only V_S, and the connectivity between the vertices in V_S and V_A, respectively. Here the acquaintance pruning strategy focuses on the edges between the vertices in V_A. Note that all vertices in V_Aare excluded from expansion (and thus the corresponding V_S is pruned) if

∑

v∈VA

|VA∩ Nv| < (p − |VS|)(p − |VS| − k − 1).

The LHS of the above inequality is the total inner degree of all vertices in V_A, where the inner degree of a vertex in V_Aconsiders only the edges connecting to other vertices in V_A. The RHS is the lower bound on the total inner degree on any set of vertices extracted from VAto expand VS into a solution satisfying the acquaintance constraint. Specifically, our algorithm needs to select p− |VS| vertices from VA, and hence the inner degree of any selected vertex cannot be smaller than p− |VS| − k − 1; otherwise, the vertex must be unacquainted with more than k vertices and violate the acquaintance constraint.

The above strategy can be improved by replacing the LHS of the inequality with

∑

v∈MA|VA∩Nv|, where MAdenotes the set of p−|VS| vertices in VAwith the largest inner degrees. Therefore, with M_A, our algorithm is able to stop the search earlier, and prune off more infeasible solutions because M_A ⊆ VA, and∑

v∈MA|VA∩ Nv| ≤∑

v∈VA|VA∩ Nv|.

To obtain the exact value of∑

v∈MA|VA∩ Nv|, we need to identify the vertices in the set M_Afirst, which may require sorting the vertices in V_A. However, since the size of M_A (i.e., p− |VS|) is usually small, we can use a few times of extracting maximum to

iden-tify the vertices in M_A, rather than sorting the entire V_A. Specifically, the acquaintance pruning is specified as follows.

Lemma 3.3.4. The acquaintance pruning strategy stops selecting a vertex from VAto VS

if ∑

v∈MA

|VA∩ Nv| < (p − |VS|) (p − |VS| − k − 1) , (3.5)

and the acquaintance pruning strategy can prune the search space with no feasible solu-tion.

Proof. To acquire a tighter bound for the acquaintance pruning, we consider∑

v∈MA|VA∩ Nv| instead of∑

v∈VA|VA∩Nv| in the LHS of Eq. (3.5). Since we will only extract p−|VS| vertices from V_Ato join V_S, the upper bound of total inner degree of the extracted vertices is∑

v∈MA|VA∩ Nv|, where MAdenotes the set of p− |VS| vertices in VAwith the largest inner degrees. If these p−|VS| extracted vertices follow the acquaintance constraint, each of them must be acquainted with at least p− |VS| − k − 1 extracted vertices, which means there will be at least (p− |VS|)(p − |VS| − k − 1) inner degrees. When the acquaintance pruning happens, it means even the upper bound of total inner degree of the extracted vertices is smaller than (p− |VS|)(p − |VS| − k − 1), which indicates that there is at least one vertex being unacquainted with more than k vertices and violating the acquaintance constraint. Therefore, the pruned search space contains no feasible solution. The lemma follows.

In the following, we first prove that our algorithm with the above strategies finds the optimal solution. After that, Example 3.3.1 provides illustration of Algorithm SGSelect, and the pseudo code of SGSelect can be found in Appendix B.

Theorem 3.3.2. SGSelect obtains the optimal solution to SGQ.

Proof. In radius graph extraction, each of the removed vertices has no path with at most s edges connecting to q, and no feasible solution thereby contains these vertices. Algorithm SGSelect includes three strategies: access ordering, distance pruning, and acquaintance pruning. For each V_S, interior unfamiliarity and exterior expansibility do not consider the

(a)

Figure 3.2: Another illustrative example for SGQ and STGQ. (a) The sample social net-work, (b) the social distances of candidate attendees and (c) the schedules of candidate attendees.

vertex violating the acquaintance constraint, which are proven in Lemma 3.3.1 and Lemma 3.3.2, respectively. After we choose a vertex u from V_A, Lemma 3.3.3 shows that the distance pruning specifies a lower bound on the total social distance that is derived from V_A. Therefore, the distance pruning will prune off only the solution with a larger total social distance. Moreover, Lemma 3.3.4 shows that the acquaintance pruning specifies the lower bound on the total inner degree on any set of vertices extracted from V_A in any feasible solution. If we choose the required number of vertices with the largest inner degrees from VAand the result cannot exceed the above lower bound, the connectivity is too small for V_Ato obtain a feasible solution. The theorem follows.

Example 3.3.1. In this illustrating example for SGSelect, we revisit the social network in Figure 3.1(a) and assume that v₇ issues an SGQ with p = 4, s = 1, and k = 1. All can-didate attendees v_iwith d¹_v

i,v7 <∞ are shown in Figure 3.2(a), with their social distances to v₇ listed in Figure 3.2(b).⁶ In the beginning, V_S = {v7} and VA = {v2, v₃, v₄, v₆, v₈}.

We first consider selecting v₂(i.e., the vertex with the smallest social distance) from V_Ato expand V_S. Afterwards, we have A(V_S ∪ {v2}) = 3 and (p − |VS∪ {v2}|) = 4 − 2 = 2, which means that the exterior expansibility condition holds if we select v₂.⁷ In

addi-6Some small modifications are made for better illustration.

7To find A(VS ∪ {v2}), we derive |VA∩ Nv7| + (k − |VS− {v7} − Nv7|) = 4 + (1 − 0) = 5 and

|VA∩ Nv2| + (k − |VS− {v2} − Nv2|) = 2 + (1 − 0) = 3, and then choose the smaller one. Therefore, A(V_S∪ {v2}) = 3 holds.

tion, U (V_S ∪ {v2}) = 0 and k[

|VS∪{v2}|

]θ

= 1 × (²₄)² = ¹₄ (assume θ = 2), i.e., the interior unfamiliarity condition also holds, and hence v₂ is selected.⁸ Now we have V_S ={v2, v₇}, VA={v3, v₄, v₆, v₈}, and the next vertex to be considered is v3 according to the social distance. The exterior expansibility condition holds when v₃is selected, since A(V_S∪{v3}) = 1 ≥ (p−|VS∪ {v3}|) = 1. However, it violates the interior unfamiliarity here because there are still more vertices in VA(we put v3in parenthesis and temporarily skip it, i.e., V_A={(v3), v₄, v₆, v₈}). Now the next vertex to be considered is v6since both of the exterior expansibility condition and the interior unfamiliarity condition hold. As a result, we have V_S = {v2, v₆, v₇} and VA = {v3, v₄, v₈}. Again, selecting v3 violates the interior unfamiliarity condition, so we temporarily skip v₃. When selecting v₈, we observe that it violates the exterior expansibility condition and then remove v₈ from V_A. Therefore, we choose v₄instead and obtain the first feasible solution{v2, v₄, v₆, v₇} (total social distance = 64). Note that if we set a small θ and allow v3 to be selected earlier, it leads to the generation of an infeasible candidate group{v2, v₃, v₆, v₇}, instead of the first feasible solution. If we can acquire the first feasible solution early, we are able to

在文檔中於社群網路中之高效能鏈結預測與群組查詢 (頁 57-68)