• 沒有找到結果。

3.3 Social Group Query

3.3.2 Algorithm Design

In this section, we propose a novel algorithm, namely SGSelect, to solve SGQ effi-ciently. Our idea is to first derive a feasible graph GF = (VF, EF) from G based on our observation on the social radius constraint, such that there exists a path with at most s edges from q to each vertex in GF. Starting from q, we iteratively explore GF to derive the optimal solution. At each iteration, we keep track of the set of vertices that satisfy the acquaintance constraint as the intermediate solution obtained so far (denoted as VS).

Initially, we set VS = {q}, and let VA denote the set of remaining vertices in VF, i.e., VA= VF − VS. We select a vertex in VAand examine whether it is feasible (i.e., follow-ing the acquaintance constraint) to move this vertex to VSat each iteration, until VShas p vertices and the process stops.

The selection of a vertex from VA at each iteration is critical to the performance of query processing. It is essential to avoid choosing a vertex v that may significantly in-crease the total social distance or lead to the violation of the acquaintance constraint. We observe that the access order of nodes in constructing candidate groups is a key factor to the overall performance. It is important to take a priority to consider nodes that are very likely to be included in the final answer group, i.e., the optimal solution. This may facilitate effective early pruning of unqualified solutions. Additionally, social radius and acquaintance constraints can be exploited to facilitate efficient pruning of vertices which would not lead to the eventual answer. We summarize our ideas as follows.

Access ordering. To guide an efficient exploration of the solution space, we access vertices in VA following an order that incorporates (1) the increment of the total social distance and (2) the feasibility for the acquaintance constraint. Accordingly, we define the notion of interior unfamiliarity and exterior expansibility of VS to test the feasibility of examined vertices to the acquaintance constraint during the vertex selection.

Distance pruning. To avoid exploring vertices in VAthat do not lead to a better solu-tion in terms of total social distance, Algorithm SGSelect keeps track of the best feasible solution obtained so far and leverages its total social distance to prune redundant

exami-nations of certain search space.

Acquaintance pruning. We explore the properties of the acquaintance constraint to facilitate search space pruning. Specifically, we define the notion of inner degree of the vertices in VAand derive its lower bound, such that a feasible solution can be derived from vertices in VSand VA. The lower bound is designed to detect the stop condition when there exists no feasible solution after including any vertex in VA.

To find the optimal solution, Algorithm SGSelect may incur an exponential time in query processing because SGQ is NP-hard. In the worst case, all candidate groups may need to be considered. However, by employing the above pruning strategies, the average running time of Algorithm SGSelect can be effectively reduced, as to be shown in Section 3.5. In the following, we present the details of the proposed algorithm.

Radius Graph Extraction

Obviously, the social radius constraint can effectively prune redundant candidates in the social network of the activity initiator. Thus, Algorithm SGSelect first extracts the vertices that satisfy the social radius constraint. A simple approach to meet the social radius constraint is to find the minimum-edge path (i.e., the shortest path with the minimum number of edges) between q and every other vertex, and then remove those vertices that have their minimum-edge paths longer than s edges. Nevertheless, the minimum-distance path with at most s edges and the minimum-edge path can be different. As a result, the total distance of the minimum-edge path may not be the minimum distance. Moreover, the minimum-distance path may consist of more than s edges which does not satisfy the social radius constraint. To address the above problem, we define the notion of i-edge minimum distance, which represents the total distance of the minimum-distance path with no more than i edges as follows.

Definition 3.3.1. The i-edge minimum distance between the vertex v and the vertex q is

div,q = min

u∈Nv

{div,q−1, diu,q−1+ cu,v} ,

where Nv is the set of neighboring vertices of v.

Based on dynamic programming, Algorithm SGSelect computes the i-edge minimum distance between the vertex v and the vertex q by iteratively deriving div,qin terms of diu,q−1 of each neighboring vertex u, for 1≤ i ≤ s. Initially, we set d0v,q as∞ for every vertex v, v ̸= q. We set d0q,q as 0 and derive d1v,q for every vertex v in Nq. At the next iteration, we update d2v,q for v if there exists a neighbor u of v such that d1u,q+ cu,v is smaller than d1v,q. This case indicates that there is an alternate path from v to q via a neighbor u, and the path has a smaller total distance. Our algorithm iterates at most s times for each vertex.

Therefore, each vertex v with dsv,q <∞ is extracted in our algorithm to construct a feasible graph GF = (VF, EF). In the graph, the total distance of the minimum-distance path with at most s edges (i.e., dsv,q) is adopted as the social distance between v and q (i.e., dv,q).

In other words, we ensure that every vertex in VF satisfies the social radius constraint.

Therefore, we consider GF in evaluating the SGQ for the rest of this chapter.

Access Ordering

After constructing the feasible graph GF, Algorithm SGSelect iteratively explores GF to find the optimal solution. Initially, the intermediate solution set VSincludes only q, and the remaining vertex set VAis VF− {q}. At each iteration afterwards, we select and move a vertex from VAto VS in order to expand the intermediate solution in VS. Therefore, VS represents a feasible solution when|VS| = p and the vertices in VSsatisfy the acquaintance constraint. Next, our algorithm improves the feasible solution by backtracking the above exploration procedure to previous iterations and choosing an alternative vertex in VAto expand VS. A branch-and-bound tree is maintained to record the exploration history for backtracking. This process continues until VS has p vertices.

To reduce the running time and search space, the selection of a vertex at each iteration is critical. Naturally, we would like to include a vertex that minimizes the increment of the total social distance. Nevertheless, the connectivity of the selected vertex imposes additional requirements for satisfying the acquaintance constraint. Thus, we introduce the

notion of interior unfamiliarity and exterior expansibility with respect to the intermediate solution set VS to exploit the acquaintance constraint in query processing.

Definition 3.3.2. The interior unfamiliarity of VS is

U (VS) = max

v∈VS

|VS− {v} − Nv| ,

where Nv is the set of neighboring vertices of v in GF. The set VS − {v} − Nv refers to the set of non-neighboring vertices of v in VS.

The interior unfamiliarity describes the connectivity within the intermediate solution set, and a smaller interior unfamiliarity means the vertices in VS are more densely con-nected. As shown later, the interior unfamiliarity of possible intermediate solution sets are taken into account in deciding which vertex is to be included in the process of generating the candidate groups. It is preferable to first include a well-connected vertex that results in the intermediate solution set with low interior unfamiliarity since it may make selections of other vertices in the later iterations easier. Next, we define the exterior expansibility of an intermediate solution set VS, denoted by A(VS), as the maximum number of vertices that VS can be expanded from.

Definition 3.3.3. The exterior expansibility of VS is

A(VS) = min

v∈VS

{|VA∩ Nv| + (k − |VS− {v} − Nv|)}, (3.1)

where the first set (i.e., VA∩ Nv) contains the neighboring vertices of v in VA and the second set (i.e., VS− {v} − Nv) contains the non-neighboring vertices of v in VS.

The exterior expansibility counts the number of options when selecting vertices from VA to expand VS, and a larger exterior expansibility means the acquaintance constraint is easier to follow during the vertex selection. Specifically, since the number of ex-isting non-neighboring vertices of v in VS is |VS− {v} − Nv|, we can select at most k − |VS− {v} − Nv| extra non-neighboring vertices of v from VA to expand VS; oth-erwise, vertex v would have more than k non-neighboring vertices in VS and violate the

acquaintance constraint. Therefore, for a vertex v in VS, there are at most|VA∩ Nv| neigh-boring vertices and k− |VS− {v} − Nv| non-neighboring vertices to be selected from VA

in order to expand VS.

When selecting a vertex v to expand VS, we consider both the increment of the total social distance caused by v and the connectivity of vertices in the new intermediate solu-tion set containing v, which is captured by U (VS∪ {v}) and A(VS∪ {v}). Specifically, Algorithm SGSelect chooses the vertex v with the minimum social distance to q that sat-isfies the following two conditions for interior unfamiliarity and exterior expansibility, respectively.

Interior Unfamiliarity Condition. The first condition considers the interior un-familiarity. Note that a small value of interior unfamiliarity indicates that every vertex v ∈ VShas plenty of neighboring vertices in VS, i.e., the current intermediate solution set VS is likely to be expanded into feasible solutions satisfying the acquaintance constraint.

Based on this observation, we employ the interior unfamiliarity condition, i.e.,

U (VS∪ {v}) ≤ k

[|VS∪ {v}|

p

]θ

,

where θ≥ 0 and|VS∪{v}|p is the proportion of attendees that have been considered, to ensure the value of interior unfamiliarity remaining small when the vertex v is selected. Note that the right-hand-side (RHS) of the inequality reaches its maximum, i.e., k, when θ is fixed as 0. With θ = 0, it is flexible to find a vertex v with a small social distance. However, if a vertex v resulting in U (VS∪ {v}) = k is selected, the vertex with k non-neighboring vertices in the set VS ∪ {v} is required to connect to all the vertices chosen from VAat later iterations. Thus, the feasibility of selecting other qualified vertices in later iterations is thereby decreased. In contrast, a larger θ allows SGSelect to choose a vertex from VA that connects to more vertices in VS to ensure the feasibility at later iterations. Note that the RHS of the condition increases when VS includes more vertices. On the other hand, the algorithm reduces θ if there exists no vertex in VAthat can satisfy the above condition.

When θ decreases to 0 and the above condition still does not hold, i.e., U (VS∪ {v}) > k,

Algorithm SGSelect stops expanding the new intermediate solution set VS = VS ∪ {v}, because adding any vertex from VAto this new intermediate solution set does not generate a feasible solution, as shown by the following lemma.

Lemma 3.3.1. Given that

U (VS) > k, (3.2)

there must exist at least one vertex v in VS such that v cannot follow the acquaintance constraint for every possible selection of vertices from VA.

Proof. If U (VS) > k, then we can find at least one vertex v in VS such that|VS− {v} − Nv| > k, which means v is already unacquainted with more than k other vertices in VS. Please note that adding any vertex from VAto VScannot increase the connectivity between the existing vertices in VS. When we expand VS by adding any vertex u from VA, the number of vertices that are unacquainted with v will remain unchanged if u and v are connected, but increase by one otherwise. Therefore, after the expansion, the number of vertices in VS that are unacquainted with v must still exceed k. That is, v still violates the acquaintance constraint, and hence VS cannot form a feasible solution. The lemma follows.

Exterior Expansibility Condition. Now we discuss the second condition based on the exterior expansibility, which represents the maximum number of vertices in VAthat can be considered for expanding the intermediate solution set VS, and this value must be no smaller than the number of attendees required to be added later, i.e., p−|VS|. Therefore, SGSelect chooses the vertex v from VAthat can satisfy the exterior expansibility condition, i.e.,

A (VS∪ {v}) ≥ (p − |VS∪ {v}|) .

If the inequality does not hold, the new intermediate solution set obtained by adding v is not expansible, as shown by the following lemma.

Lemma 3.3.2. Given that

A (VS) < (p− |VS|) , (3.3)

there must exist at least one vertex v in VS such that v cannot follow the acquaintance constraint for every possible selection of vertices from VA.

Proof. If A (VS) < (p− |VS|), then we can find at least one vertex v in VS

such that |VA∩ Nv| + (k − |VS− {v} − Nv|) < (p − |VS|). In other words, (k− |VS− {v} − Nv|) < (p − |VS|)−|VA∩ Nv|. As mentioned above, |VS− {v} − Nv| is the number of non-neighboring vertices for v, and k− |VS − {v} − Nv| thereby repre-sents the ”quota” for v to choose non-neighbor vertices from VA. For any possible selection VbA⊆ VA, let bλAdenote the number of neighbor vertex of v in bVA. Since bλA≤ |VA∩ Nv|, (p− |VS|) − |VA∩ Nv| ≤ (p − |VS|) − bλA. Therefore, if A (VS) < (p− |VS|), then (k− |VS− {v} − Nv|) < (p − |VS|) − bλA, and v does not have enough quota to support VbAfor satisfying the acquaintance constraint. The lemma follows.

Distance and Acquaintance Pruning

In the following, we further exploit two pruning strategies to reduce the search space.

Our algorithm aims to obtain a feasible solution early since the total social distance of this solution can be used for pruning redundant candidates. At each iteration, the following distance pruning strategy avoids exploring the vertices in the remaining vertex set VAif they do not lead to a solution with a smaller total social distance.

Lemma 3.3.3. The distance pruning strategy stops selecting a vertex from VAto VS if

D−

v∈VS

dv,q < (p− |VS|) min

v∈VA

dv,q, (3.4)

where D is the total social distance of the best feasible solution obtained so far. The distance pruning strategy can prune the search space with no better solution.

Proof. If the above condition holds, it is impossible to find an improved solution by ex-ploring VA, since the total social distance of any new solution must exceed D when we select p− |VS| vertices from VA. Algorithm SGSelect considers minv∈VAdv,q in distance pruning to avoid sorting the distances of all vertices in VA, which requires additional

com-putation and may not be scalable for a large social network. Please note that as the best obtained solution improves at later iterations, we are able to derive a smaller upper bound in the LHS, and thus prune a larger search space with distance pruning. The lemma fol-lows.

In addition to pruning the search space that does not lead to a smaller total social distance, we also propose an acquaintance pruning strategy that considers the feasibility of selecting vertices from VA, and stops exploring VA if there exists no solution that can satisfy the acquaintance constraint. Earlier in this section, the interior unfamiliarity and the exterior expansibility consider the connectivity between the vertices in only VS, and the connectivity between the vertices in VS and VA, respectively. Here the acquaintance pruning strategy focuses on the edges between the vertices in VA. Note that all vertices in VAare excluded from expansion (and thus the corresponding VS is pruned) if

v∈VA

|VA∩ Nv| < (p − |VS|)(p − |VS| − k − 1).

The LHS of the above inequality is the total inner degree of all vertices in VA, where the inner degree of a vertex in VAconsiders only the edges connecting to other vertices in VA. The RHS is the lower bound on the total inner degree on any set of vertices extracted from VAto expand VS into a solution satisfying the acquaintance constraint. Specifically, our algorithm needs to select p− |VS| vertices from VA, and hence the inner degree of any selected vertex cannot be smaller than p− |VS| − k − 1; otherwise, the vertex must be unacquainted with more than k vertices and violate the acquaintance constraint.

The above strategy can be improved by replacing the LHS of the inequality with

v∈MA|VA∩Nv|, where MAdenotes the set of p−|VS| vertices in VAwith the largest inner degrees. Therefore, with MA, our algorithm is able to stop the search earlier, and prune off more infeasible solutions because MA ⊆ VA, and∑

v∈MA|VA∩ Nv| ≤

v∈VA|VA∩ Nv|.

To obtain the exact value of∑

v∈MA|VA∩ Nv|, we need to identify the vertices in the set MAfirst, which may require sorting the vertices in VA. However, since the size of MA (i.e., p− |VS|) is usually small, we can use a few times of extracting maximum to

iden-tify the vertices in MA, rather than sorting the entire VA. Specifically, the acquaintance pruning is specified as follows.

Lemma 3.3.4. The acquaintance pruning strategy stops selecting a vertex from VAto VS

if

v∈MA

|VA∩ Nv| < (p − |VS|) (p − |VS| − k − 1) , (3.5)

and the acquaintance pruning strategy can prune the search space with no feasible solu-tion.

Proof. To acquire a tighter bound for the acquaintance pruning, we consider

v∈MA|VA Nv| instead of

v∈VA|VA∩Nv| in the LHS of Eq. (3.5). Since we will only extract p−|VS| vertices from VAto join VS, the upper bound of total inner degree of the extracted vertices is∑

v∈MA|VA∩ Nv|, where MAdenotes the set of p− |VS| vertices in VAwith the largest inner degrees. If these p−|VS| extracted vertices follow the acquaintance constraint, each of them must be acquainted with at least p− |VS| − k − 1 extracted vertices, which means there will be at least (p− |VS|)(p − |VS| − k − 1) inner degrees. When the acquaintance pruning happens, it means even the upper bound of total inner degree of the extracted vertices is smaller than (p− |VS|)(p − |VS| − k − 1), which indicates that there is at least one vertex being unacquainted with more than k vertices and violating the acquaintance constraint. Therefore, the pruned search space contains no feasible solution. The lemma follows.

In the following, we first prove that our algorithm with the above strategies finds the optimal solution. After that, Example 3.3.1 provides illustration of Algorithm SGSelect, and the pseudo code of SGSelect can be found in Appendix B.

Theorem 3.3.2. SGSelect obtains the optimal solution to SGQ.

Proof. In radius graph extraction, each of the removed vertices has no path with at most s edges connecting to q, and no feasible solution thereby contains these vertices. Algorithm SGSelect includes three strategies: access ordering, distance pruning, and acquaintance pruning. For each VS, interior unfamiliarity and exterior expansibility do not consider the

(a)

Figure 3.2: Another illustrative example for SGQ and STGQ. (a) The sample social net-work, (b) the social distances of candidate attendees and (c) the schedules of candidate attendees.

vertex violating the acquaintance constraint, which are proven in Lemma 3.3.1 and Lemma 3.3.2, respectively. After we choose a vertex u from VA, Lemma 3.3.3 shows that the distance pruning specifies a lower bound on the total social distance that is derived from VA. Therefore, the distance pruning will prune off only the solution with a larger total social distance. Moreover, Lemma 3.3.4 shows that the acquaintance pruning specifies the lower bound on the total inner degree on any set of vertices extracted from VA in any feasible solution. If we choose the required number of vertices with the largest inner degrees from VAand the result cannot exceed the above lower bound, the connectivity is too small for VAto obtain a feasible solution. The theorem follows.

Example 3.3.1. In this illustrating example for SGSelect, we revisit the social network in Figure 3.1(a) and assume that v7 issues an SGQ with p = 4, s = 1, and k = 1. All can-didate attendees viwith d1v

i,v7 <∞ are shown in Figure 3.2(a), with their social distances to v7 listed in Figure 3.2(b).6 In the beginning, VS = {v7} and VA = {v2, v3, v4, v6, v8}.

We first consider selecting v2(i.e., the vertex with the smallest social distance) from VAto expand VS. Afterwards, we have A(VS ∪ {v2}) = 3 and (p − |VS∪ {v2}|) = 4 − 2 = 2, which means that the exterior expansibility condition holds if we select v2.7 In

addi-6Some small modifications are made for better illustration.

7To find A(VS ∪ {v2}), we derive |VA∩ Nv7| + (k − |VS− {v7} − Nv7|) = 4 + (1 − 0) = 5 and

|VA∩ Nv2| + (k − |VS− {v2} − Nv2|) = 2 + (1 − 0) = 3, and then choose the smaller one. Therefore, A(VS∪ {v2}) = 3 holds.

tion, U (VS ∪ {v2}) = 0 and k[

|VS∪{v2}|

p

]θ

= 1 × (24)2 = 14 (assume θ = 2), i.e., the interior unfamiliarity condition also holds, and hence v2 is selected.8 Now we have VS ={v2, v7}, VA={v3, v4, v6, v8}, and the next vertex to be considered is v3 according to the social distance. The exterior expansibility condition holds when v3is selected, since A(VS∪{v3}) = 1 ≥ (p−|VS∪ {v3}|) = 1. However, it violates the interior unfamiliarity here because there are still more vertices in VA(we put v3in parenthesis and temporarily skip it, i.e., VA={(v3), v4, v6, v8}). Now the next vertex to be considered is v6since both of the exterior expansibility condition and the interior unfamiliarity condition hold. As a result, we have VS = {v2, v6, v7} and VA = {v3, v4, v8}. Again, selecting v3 violates the interior unfamiliarity condition, so we temporarily skip v3. When selecting v8, we observe that it violates the exterior expansibility condition and then remove v8 from VA. Therefore, we choose v4instead and obtain the first feasible solution{v2, v4, v6, v7} (total social distance = 64). Note that if we set a small θ and allow v3 to be selected earlier, it leads to the generation of an infeasible candidate group{v2, v3, v6, v7}, instead of the first feasible solution. If we can acquire the first feasible solution early, we are able to

= 1 × (24)2 = 14 (assume θ = 2), i.e., the interior unfamiliarity condition also holds, and hence v2 is selected.8 Now we have VS ={v2, v7}, VA={v3, v4, v6, v8}, and the next vertex to be considered is v3 according to the social distance. The exterior expansibility condition holds when v3is selected, since A(VS∪{v3}) = 1 ≥ (p−|VS∪ {v3}|) = 1. However, it violates the interior unfamiliarity here because there are still more vertices in VA(we put v3in parenthesis and temporarily skip it, i.e., VA={(v3), v4, v6, v8}). Now the next vertex to be considered is v6since both of the exterior expansibility condition and the interior unfamiliarity condition hold. As a result, we have VS = {v2, v6, v7} and VA = {v3, v4, v8}. Again, selecting v3 violates the interior unfamiliarity condition, so we temporarily skip v3. When selecting v8, we observe that it violates the exterior expansibility condition and then remove v8 from VA. Therefore, we choose v4instead and obtain the first feasible solution{v2, v4, v6, v7} (total social distance = 64). Note that if we set a small θ and allow v3 to be selected earlier, it leads to the generation of an infeasible candidate group{v2, v3, v6, v7}, instead of the first feasible solution. If we can acquire the first feasible solution early, we are able to