Node Indexing of AST Using SB - Index Construction and Maintenance

4.4 Index Construction and Maintenance

4.4.1 Node Indexing of AST Using SB

To effectively reduce redundant node processing in CSGQs, it is crucial to create SBs with the minimum number of nodes and ensure solution optimality by considering each

kind of node (i.e., pruned nodes, solution nodes, and internal nodes) in AST. Here we address this essential issue by deriving a set of node selection rules for building SBs under various query parameters. We first focus on the acquaintance constraint k in Rule 1 and then return to the social radius constraint s in Rule 2.

Rule 1: node indexing for different k

(1) Pruned nodes. We categorize pruned nodes into four types: IU-pruned nodes, EE-pruned nodes, acquaintance-EE-pruned nodes and distance-EE-pruned nodes, which correspond to Eqs. (3.2), (3.3), (3.4) and (3.5) in Section 3.3.2, respectively. Given s and k of the first SGQ, we examine if a pruned node is needed in the (s,k^′)-SB for processing a new SGQ with k^′as follows.

• IU-pruned nodes. All IU-pruned nodes do not appear in any (s,k^′)-SB with k^′ ≤ k, since k^′ represents a tighter acquaintance constraint. On the other hand, when k^′ > k, an IU-pruned node is not included in any (s,k^′)-SB if k^′ < U (V_S) since insufficient social tightness within V_Sprevents this node from becoming a solution.

Therefore, an IU-pruned node only appears in the (s,k^′)-SB where

k^′ ≥ max{U(VS), k + 1}. (4.1)

Example 4.4.1. Figure 4.2(b) presents an illustrative example with an IU-pruned node P 1 to identify the corresponding SBs. P 1 is generated in the first query with (s, k) = (3, 3), and its V_Sand V_Aare{v2, v₆, v₇, v₈, v₉, v₁₁} and {v1, v₄, v₅, v₁₀}, re-spectively. It is not necessary to calculate U (V_S) here, since U (V_S) = 4 was derived when solving the first query. According to Eq. (4.1), when s remains unchanged, P 1 only needs to appear in the (3,k^′)-SBs with k^′ ≥ 4, which are (3,4)-SB, (3,5)-SB and (3,6)-SB.

• EE-pruned nodes. As with the IU-pruned nodes, all EE-pruned nodes will not ap-pear in any (s,k^′)-SB with k^′ ≤ k for the same reason. Moreover, an EE-pruned node will be pruned again in any (s,k^′)-SB if k^′− k < p − |VS| − A(VS), since the

social connectivity between V_Sand V_Ais still too small with respect to k^′. Therefore, an EE-pruned node only appears in the (s,k^′)-SB where

k^′ ≥ max{p − |VS| − A(VS) + k, k + 1}.

• Acquaintance-pruned nodes. An acquaintance-pruned node is included in an (s,k^′ )-SB only if k^′ > k and∑

v∈MA|VA∩ Nv| ≥ (p − |VS|)(p − |VS| − k^′ − 1) (i.e., Eq. (3.5) does not hold to trigger acquaintance pruning). That is, an acquaintance-pruned node only appears in the (s,k^′)-SB where

k^′ ≥ max{p − |VS| − 1 − ∑

v∈MA

|VA∩ Nv|/(p − |VS|), k + 1}.

Note that the value of∑

v∈MA|VA∩ Nv| has already been derived in the first query and does not change when k is replaced by k^′, and exploiting these unchanged parts helps reduce computation when processing succeeding SGQs.

• Distance-pruned nodes. In contrast, distance-pruned nodes need to appear and may be expanded in the (s,k^′)-SB when k^′ < k, since k^′represents a tighter acquaintance constraint, and the solutions that trim off the distance-pruned nodes may not be feasible. However, including every distance-pruned node in all (s,k^′)-SBs in this situation is not necessary. Instead, we employ the distance pruning strategy again to filter out the distance-pruned nodes that never become a better solution in each SB.

Specifically, if the solutions generated in the previous queries are feasible under k^′, the one with the smallest total social distance is kept in the (s,k^′)-SB, and this total social distance is then employed to update D in distance pruning for filtering. On the other hand, distance-pruned nodes are not included in (s,k^′)-SBs when k^′ ≥ k, because the original solutions that trim off these nodes are still better solutions.

In the above cases, we explored whether a pruned node appears in (s,k^′)-SB according to its original pruning type. However, taking a distance-pruned node as an example, when it is included in an (s,k^′)-SB with k^′ < k, it may violate the new tighter interior

unfamil-iarity condition and be trimmed off by IU pruning. To further reduce the number of nodes, we examine each type of pruned nodes for the other three types of pruning strategies with the corresponding s and k^′. These examinations are almost the same as in Section 3.3.2, except that some parts of the inequalities have already been derived and can be reused directly.

(2) Solution nodes. It is desirable for the SBs to include solution nodes to facilitate early pruning in the new query. A solution node here can be any node with|VS| = p, e.g., any feasible solution (not necessarily the optimal one). Specifically, any solution node can be selected in an (s,k^′)-SB if k^′ ≥ U(VS), since the solution node still satisfies the acquaintance constraint k^′. Nevertheless, if there is already another solution node in the (s,k^′)-SB, we only need to keep the one with the smaller total social distance to facilitate distance pruning afterwards.

(3) Internal nodes. To effectively minimize the storage overhead, no internal node is included in (s,k^′)-SBs, since all feasible solutions expanded from an internal node either are the solution nodes in its sub-tree or can be expanded from the pruned nodes in its sub-tree.

Rule 2: node indexing for different s (1) Pruned nodes.

• IU-pruned nodes. No IU-pruned node needs to be included in any (s^′,k)-SB, since changing the social radius constraint does not increase the connectivity between the existing vertices in V_S. Thus, all the IU-pruned nodes are infeasible with any s^′.

• EE-pruned nodes. In contrast to the IU-pruned nodes, some EE-pruned nodes may be successfully expanded to generate new sub-trees when s^′ > s, since new candi-date attendees may appear in V_A. Therefore, if s^′ > s, it is necessary to derive the corresponding V_A^s^′(i.e., the candidate attendees within s^′hops from the initiator) for an EE-pruned node.² We then update the social distance according to different s^′to

2The tightest social radius constraint that allows a vertex v to be included as a candidate can be identified from the radius graph extraction procedure, and it is the smallest i such that dⁱ_v,q<∞.

keep track of the status of the pruned node.³ An (s^′,k)-SB includes an EE-pruned node only if its V_A^s^′ is large enough such that A(V_S)≥ p − |VS|, implying that Eq.

(3.3) does not hold and prevents EE pruning.

Example 4.4.2. Figure 4.2(b) presents an illustrative example with an EE-pruned node P 5 to identify the corresponding SBs. P 5 is generated in the first query with (s, k) = (3, 3), and its V_S and V_A are {v3, v₆, v₇, v₈, v₉} and {v1, v₄, v₅, v₁₀, v₁₁}, respectively. According to Rule 2-(1), an EE-pruned node is only considered for the (s^′,k)-SBs with s^′ > s. Since s_max = 4, where s_max is the largest possible s^′, P 5 may only stay in the (4,3)-SB. Note that the tightest social radius constraint that allows a vertex v to be included as a candidate can be identified from the radius graph extraction procedure. Therefore, V_A⁴ = V_A³. Since V_A^s^′ is unchanged, A(VS) will remain the same when s^′ = 4, which indicates that Eq. (3.3) still holds to trim off P 5 again. By excluding P 5 from the (s^′,3)-SBs, the node selection rules effectively reduce the processing time of the succeeding queries.

Note that, although VS contains the same set of vertices under different s^′, the so-cial distances of the vertices in V_Smay change and affect the later distance pruning.

Therefore, in addition to tracking each node’s V_A^s^′ for different s^′, we update the corresponding V_S for different s^′, denoted as V_S^s^′. Moreover, V_S^s^′ or V_A^s^′ for the same node under different s^′ tend to share many common vertices. Therefore, to efficiently maintain V_S^s^′ and V_A^s^′ under different s^′, we hierarchically save the differ-ence among them. That is, we first save a base node for V_S^s^′ or V_A^s^′with the smallest s^′. When new candidates join or when the social distance of any vertex becomes smaller for a larger s^′, these new candidates or the difference of social distances will be recorded in a delta node. With the base node and the delta nodes, we can dynamically generate the corresponding V_S^s^′ and V_A^s^′ of the specified s^′ for further expansion. Example 4.4.3 illustrates how the base node and delta nodes work.

3The social distance of any vertex in V_A^s^′ for different s^′ can also be derived from the radius graph extraction procedure, since the social distance of a vertex v for s^′is exactly dⁱ_v,qwith i = s^′.

Table 4.2: (a) The social distances from different vertices to v7 under various s, and (b)

Example 4.4.3. This example employs a query with a smaller group size p to illus-trate the function of the base node and delta nodes. Assume that v₇ in Figure 4.2(a) issues a query with (p, s, k) = (4, 1, 1). With the radius graph extraction, we obtain the social distance of each vertex under different s. Part of the results are listed in Table 4.2(a), and the social distance of a vertex may decrease as s increases. More-over, a vertex is included as a candidate in V_A^s^′if its social distance becomes smaller than ∞ under the social constraint s = s^′. Here we consider an EE-pruned node P 5 with V_S ={v6(20), v₇(0)} and VA={v4(27), v₉(13), v₁₁(23)} generated in the query as an example. (The number in the parentheses next to v_i is the social dis-tance from v_ito the initiator v₇.) A naive approach to handle various s^′is generating standalone copies of P 5 for each s^′from s + 1 = 2 to s_max = 4, i.e., C_{P 5}² , C_{P 5}³ and C_{P 5}⁴ in Table 4.2(b), where smaxis the largest possible s. However, we observe that there is a large overlap among C_{P 5}² , C_{P 5}³ and C_{P 5}⁴ . Therefore, in the following, we will show how to condensedly maintain these copies using the base node and the delta nodes.

First, the base node B_{P 5}²⁻⁴ contains the V_S^s^′ and V_A^s^′ of P 5 with the smallest s^′ (i.e., 2). Here the index 2− 4 means this node is used when reconstructing the VS^s^′ and V_A^s^′ with s^′ = 2, 3, or 4. When s^′ increases to 3, it is necessary to record the newly joined vertex (i.e., v₁₂) and the difference of social distance (i.e.,−5 for v5) in the

delta node D_{P 5}³⁻⁴. The unchanged vertices can be omitted to save space. Since there is no further change when s^′ = 4, more delta nodes do not need to be generated.

When a new query comes in, we only need to take the base node and use delta nodes to add new candidates or update the social distance as necessary. Thus, the V_S^s^′ and V_A^s^′ that fit the new social radius constraint can be dynamically generated when needed.

• Acquaintance-pruned nodes. Similar to the EE-pruned nodes, some acquaintance-pruned nodes may be expanded into new sub-trees when s^′ > s, since new candidate attendees may appear in V_A. Specifically, an (s^′,k)-SB includes an acquaintance-pruned node only if its V_A^s^′ is large enough such that

∑

v∈M_A^s′

|VA^s^′ ∩ Nv| ≥ (p − |VS|)(p − |VS| − k − 1),

where M_A^s^′ is the set of p− |VS| vertices in VA^s^′ with the largest inner degrees. The above inequality indicates that Eq. (3.5) does not hold and the node is not pruned.

• Distance-pruned nodes. In contrast, most distance-pruned nodes, except those with V_S^s^′ violating the social radius constraint (i.e., max_v_∈Vs′

S h_v > s^′, where h_v is the number of hops from the initiator to a vertex v), need to be re-considered when s^′ changes. The reason is that when s^′ > s, the newly included vertices in V_A^s^′ may create shorter paths to the initiator. Alternately, when s^′ < s, the total social distance of the solution in the previous distance pruning may increase. In either way, the distance pruning condition may not hold, and its pruned nodes need to be included in (s^′,k)-SB for further examination.

Here we also create the base node and the delta nodes for a distance-pruned node to compactly maintain its V_S^s^′ and V_A^s^′ for different s^′ to update the social distance of any vertex, and then use the distance pruning strategy again to include only the updated distance-pruned nodes that can generate a better solution in the (s^′,k)-SB.

(2) Solution nodes. In order to facilitate early pruning for the succeeding queries and avoid missing the optimal solution, (s^′,k)-SB includes the solution nodes that follow the social radius constraint. For each solution node, we update the social distance with any vertex in V_S^s^′, so that it is associated with the correct total social distance. For each (s^′,k)-SB, we only keep the solution node of the smallest total social distance to reduce the storage overhead.

(3) Internal nodes. In contrast to the (s, k^′)-SB, internal nodes in AST play a more important role in the (s^′, k)-SB, because when s^′ > s, new candidate attendees may join.

Therefore, the internal nodes of AST need to be cached for the (s^′, k)-SBs with s^′ > s, so that new candidates can be added to the existing internal nodes without generating them all over again. Similar to the pruned nodes, we maintain V_S^s^′ and V_A^s^′ of each internal node for different s^′ using the base node and the delta nodes, so that we can dynamically generate the corresponding V_S^s^′ and V_A^s^′ of the specified s^′ for further expansion.

Rule 3: node indexing for different s and k Although the node indexes for different k and different s have been presented in Rule 1 and Rule 2, respectively, when considering both s and k, carefully combining the rules for k and for s can further reduce the number of nodes to include in SBs. Therefore, we explore the generalized case as follows.

(1) Pruned nodes.

• IU-pruned nodes. Rule 1-(1) indicates that an IU-pruned node is included in the (s,k^′)-SB if k^′ ≥ U(VS).⁴ Rule 2-(1) shows that, when k is fixed, an IU-pruned node is not in any (s^′,k)-SB regardless of s^′; however, when constructing (s^′,k^′ )-SBs, s^′ can still to reduce the number of IU-pruned nodes. That is, a pruned node is included in an (s^′,k^′)-SB only if all vertices in its V_S are within s^′ hops of the initiator, i.e., max_v_∈V_Sh_v ≤ s^′. Combining the inequalities, an IU-pruned node is included in an (s^′,k^′)-SB only if k^′ ≥ U(VS) and max_v_∈V_Sh_v ≤ s^′.

Example 4.4.4. We revisit P 1 in Example 4.4.1, with VS ={v2, v6, v7, v8, v9, v11}

4The original rule listed in Rule 1-(1) is k^′ ≥ max{U(VS), k + 1}. However, it can be simplified by observing that U (VS) must exceed k; otherwise, the IU pruning will not happen. Similarly, the rules of EE-pruned and acquaintance-pruned nodes used later are also simplified.

and V_A={v1, v₄, v₅, v₁₀}. Since U(VS) = 4 was already derived in the first query, P 1 will only appear in (s^′,k^′)-SBs with k^′ ≥ 4. Moreover, because the vertices in V_S are all within one hop of the initiator, max_v_∈V_Sh_v = 1 holds. Therefore, the IU-pruned node P 1 will only appear in (s^′,k^′)-SBs with 1≤ s^′ and k^′ ≥ 4, such as (2,4)-SB and (3,4)-SB in Table 4.1.

• EE-pruned nodes. According to Rule 1-(1), an EE-pruned node is included in the (s,k^′)-SB if

k^′ ≥ p − |VS| − A(VS) + k. (4.2) We employ the social radius constraint to reduce the number of nodes included.

Specifically, it is not necessary to keep the pruned nodes for the (s^′,k^′)-SBs with s^′ < max_v_∈V_Sh_v, even if their k^′ satisfies Eq. (4.2). Other (s^′,k^′)-SBs whose k^′ does not satisfy Eq. (4.2) can include an EE-pruned node only if their s^′ is large enough so that its A(V_S) with new candidates in V_A^s^′ is no smaller than p− |VS|, implying that this node will not be pruned by EE pruning again.

• Acquaintance-pruned nodes. According to Rule 1-(1), an acquaintance-pruned node is included in the (s,k^′)-SB if

k^′ ≥ p − |VS| − 1 − ∑

v∈MA

|VA∩ Nv|/(p − |VS|). (4.3)

We again use the social radius constraint to reduce the number of nodes included.

The pruned nodes for the (s^′,k^′)-SBs with s^′ < max_v∈V_Sh_v are not needed, even if their k^′satisfies Eq. (4.3). Other (s^′,k^′)-SBs whose k^′does not satisfy Eq. (4.3) can include an acquaintance-pruned node only if their s^′is large enough such that

∑

v∈MA^s′

|V_A^s^′∩ Nv| ≥ (p − |VS|)(p − |VS| − k^′− 1).

• Distance-pruned nodes. A distance-pruned node may be successfully expanded when k^′ < k or when s^′ ̸= s, according to Rule 1-(1) or Rule 2-(1), respectively.

Similarly, we can reduce the number of included nodes by excluding the distance-pruned nodes with max_v_∈V_Sh_v ≤ s^′. We further reuse the inequality of distance pruning strategy (i.e., Eq. (3.4)) by replacing D with the current best solution in the (s^′,k^′)-SB. If the inequality holds, the distance-pruned node is not required in the SB since it will be pruned again.

(2) Solution nodes. Combining Rule 1-(2) and Rule 2-(2), a solution node is included in the (s^′,k^′)-SBs with k^′ ≥ U(VS) and s^′ ≥ maxv∈VSh_v, i.e., satisfying the acquaintance and social radius constraints, respectively.

(3) Internal nodes. Rule 2-(3) indicates that the (s^′,k)-SBs with s^′ > s should include internal nodes for new candidates due to the increment of s^′. For changes in k^′, the internal nodes with U (VS) > k^′ violate the acquaintance constraint k^′ and does not generate a solution. Therefore, the (s^′,k^′)-SB only includes the internal nodes if k^′ ≥ U(VS) and s^′ ≥ maxv∈VSh_v.

在文檔中於社群網路中之高效能鏈結預測與群組查詢 (頁 94-103)