Accumulative Search Tree and Social Boundary

4.3 Consecutive Social Group Query

4.3.1 Accumulative Search Tree and Social Boundary

Figure 4.1 illustrates two search trees T₁ and T₂ corresponding to two SGQs with slightly different parameters (s, k) = (2, 1) and (2, 2), respectively. It can be observed that T₁and T₂share many nodes in common, including G1−G4, G6, I1, S1, D1, and E2.

Nevertheless, some nodes are different due to various pruning strategies. For example, G5 in T₁ does not appear in T₂ due to the distance pruning (thus it is marked as D2 in T₂).

Meanwhile, G7 in T₂ does not appear in T₁ because of the acquaintance pruning (thus it is marked as A1 in T₁). In addition to the distance pruning and acquaintance pruning, the interior unfamiliarity condition with θ = 0 and the exterior expansibility condition also avoid traversing redundant branches, and here we refer them as interior unfamiliarity

pruning (IU pruning) and exterior expansibility pruning (EE pruning), respectively. It is important in the design of AST to cache not only the common parts but also the different parts of T₁ and T₂, in order to support different SGQs with a variety of query parameters consecutively in the future. Also, to support quick traversal of the tree in the consecutive queries, we propose SB to index the nodes in AST. For example, indexing I1 in T1 allows future queries with parameter k = 3 to start with this node, instead of the root G1, to avoid traversing unnecessary nodes. In the following, we first introduce AST to cache the intermediate results of historical queries in a compact way.

Definition 4.3.1. An accumulative search tree is a tree structure that includes (1) internal nodes (i.e., the nodes successfully expanded in historical queries), (2) pruned nodes (i.e., the nodes where prunings happen and act as the roots of pruned branches), and (3) solu-tion nodes. Each tree node contains the informasolu-tion generated during query processing, such as V_Sand V_A.

The initial AST is the search tree generated in the first SGQ. Taking Figure 4.1 as an example, the first query is with parameters (s, k) = (2, 1), and the initial AST is T1, where A1 is a pruned node since there is a branch pruned by acquaintance pruning. Nodes Gi and Si stand for an internal node and a solution node, respectively. When processing the succeeding query, AST is updated by replacing the pruned node with an internal node to explore the branch not considered in the previous query. For example, the second query is with parameters (s, k) = (2, 2), and A1 in T₁is replaced by G7 in T₃, implying that the previously pruned branch is explored in the new query.

When the user specifies a tighter constraint, such as k = 0, not all the nodes in the existing AST (i.e., T3) are feasible for the tight constraint. On the other hand, although the root node is always feasible, it is not efficient to start the query processing with G1 because it leads to duplicate traversal. Therefore, it is desirable to index the nodes of T₃ for different social constraints in order to support the consecutive queries, and we propose SB to address this issue.

Definition 4.3.2. An (s_b,k_b)-social boundary contains pointers to a list of nodes in AST to

accelerate the processing of the query with s = s_b and k = k_b, such that expanding the nodes in the list leads to the optimal solution to this query.

Note that, during the construction of the SBs for different s and k, nodes that cannot lead to the optimal solution are excluded.¹ While a pruned node may be included in an SB with a larger k for re-expansion, it may be excluded from another SB with a smaller k due to violating the tighter acquaintance constraint. In Figure 4.1, there are two dashlined regions in T3representing (1,2)-SB and (1,3)-SB, respectively. There are only two nodes in (1,2)-SB, since the other five nodes in (1,3)-SB (i.e., I1, S2, E1, E2, and E3) violate the acquaintance constraint with k = 2 and thus are excluded from (1,2)-SB. Therefore, to answer a new query with (s, k) = (1, 2), we only need to expand the two nodes in (1,2)-SB, instead of all the 15 nodes in T₃. Specifically, the SBs can be viewed as a table containing pointers to a set of nodes, as shown in Table 4.1 with the query example in Figure 4.2.

The content of this table is filled using the nodes in AST after the first SGQ is processed.

For each succeeding SGQ with specified s and k, we are able to simply extract the nodes in the corresponding (s,k)-SB and expand these nodes to find the optimal solution. These nodes in the SB can be treated as shortcuts on AST, where expanding them directly can avoid traversing from the root of AST to reduce the computational cost for new queries.

Finding the correct nodes for each SB is crucial due to the following reasons. First, if the SB contains some nodes too close to the root, it still needs to traverse some re-dundant internal nodes in AST, leading to duplicate exploration. Second, while a node close to the leaf nodes lowers traversal cost, the new branch expanded from this node only covers a small portion of the solution space, and the optimal solution is thereby not guar-anteed. Third, if the SB includes the nodes that do not generate feasible solutions under the new social constraints, it incurs redundant caching overhead and computational cost.

Therefore, it is important to select a sound and complete set of nodes for each SB. In the following, we first present how to efficiently acquire the solution by leveraging AST and SB in Section 4.3.2. We will then detail the construction of SBs in Section 4.4 and prove

1The range of possible s and k are 1≤ s ≤ smaxand 0≤ k ≤ p − 1, where smaxis the largest possible s. According to the small world phenomenon [36], smaxdoes not need to be very large (e.g., 4), and users can also specify a desired smax.

Table 4.1: The (s,k)-SBs constructed after the first SGQ.

(s,k) Nodes indexed by the (s,k)-SB ... ...

in Theorem 4.5.1 that the nodes indexed by (s,k)-SB are sufficient for finding the optimal solution.

在文檔中於社群網路中之高效能鏈結預測與群組查詢 (頁 89-92)