Performance Analysis of SGQ and STGQ

3.5 Experimental Results

3.5.2 Performance Analysis of SGQ and STGQ

In this section, we first present an analysis on the proposed strategies in SGQ and its extension in temporal dimension (i.e., STGQ). After that, we then evaluate the proposed algorithms in the large YouTube dataset.

Analysis of SGQ. We first compare the running time of SGSelect against SGBasic,

(a) Comparison of running time with different p.

(b) Solution quality analysis of KNN.

(d) Comparison of running time with different k.

Figure 3.3: Experimental results of SGQ.

KNN, and DKS with different numbers of attendees, i.e., p. Figure 3.3(a) presents the experimental results with s = 2 and k = 3. The trends in other parameter settings, such as s = 1, are similar. The results indicate that SGSelect outperforms SGBasic and DKS, and the improvement becomes more significant as p grows because SGBasic and DKS need to carefully examine numerous candidate groups, and the processing effort of each candidate group also increases with p. SGBasic outperforms DKS for a larger p because the constraint with the same k becomes stricter under a larger p. Moreover, SGBasic is likely to detect and then discard an infeasible candidate group in an early stage (i.e., find a candidate group infeasible when checking the first one or two vertices instead of checking all the vertices in the group). However, DKS only focuses on maximizing the density of the candidate group and does not leverage the k constraint. In contrast to the above approaches, SGSelect is able to effectively prune the solution space with the proposed access ordering, distance pruning, and acquaintance pruning strategies.

Although KNN is the fastest one in Figure 3.3(a), Figure 3.3(b) manifests that many solutions returned by KNN are infeasible to SGQ. More specifically, the feasibility ratio shows the percentage of feasible groups returned by KNN. It drops quickly as p grows

because the candidates with small social distances to the initiator do not necessarily know each other. In addition, Figure 3.3(b) compares the total social distances of KNN and SGSelect. The distance ratio represents the total social distance returned from KNN di-vided by the total social distance returned from SGSelect. Note that the solution of KNN can be regarded as a lower bound on the total social distance of SGQ since the social con-straint is relaxed in KNN. Figure 3.3(b) indicates that the ratio remains above 70% even for a large p.

Figure 3.3(c) shows the results with different social radius constraints, i.e., s. As s rises, the number of candidate vertices considered (i.e., friends within s hops) increases quickly, and the running time grows rapidly as a consequence. For example, when s changes from 2 to 3, the running time of SGBasic drastically becomes near 1,000 times greater. However, the running time of SGSelect only increases 11 times. This result in-dicates that the proposed pruning strategies become more and more effective as the num-ber of candidates increases. SGSelect thereby is much more scalable than SGBasic. In addition to the social radius constraint, we also compare the running time of these two approaches under different acquaintance constraints, i.e., k. As shown in Figure 3.3(d), the running time of SGBasic only slightly changes for different k. In contrast, the run-ning time of SGSelect is reduced for a smaller k, since the IU prurun-ning, EE prurun-ning and acquaintance pruning become more effective under a tighter acquaintance constraint. Fi-nally, Figure 3.3(d) manifests that SGSelect consistently outperforms SGBasic by more than one order of magnitude, even under the loosest k.

Detailed Analysis on Proposed Strategies. The above experiment results show that the proposed algorithm requires much less time than the baseline algorithms due to the proposed strategies. In the following, we first investigate the effectiveness of the access ordering strategy, which can guide an efficient exploration of the solution space. Figure 3.4(a) shows that either interior unfamiliarity (IU) or exterior expansibility (EE) can re-duce the running time, and the proposed access ordering strategy that combines both of them leads to the greatest improvement.

Afterwards, Figures 3.4(b), 3.4(c), and 3.4(d) analyze the pruning power of

acquain-(a) Comparison of running time with different access ordering strategies used.

(b) Comparison of running time with different pruning strategies used.

(d) Comparison between the pruning count and the pruned node count. (The postfix represents the group size p.)

Figure 3.4: Analysis on pruning ability of proposed strategies.

tance pruning, distance pruning, interior unfamiliarity condition (IU pruning), and exterior expansibility condition (EE pruning), where a node in Figure 3.4(d) represents a visited state in the branch-and-bound tree. Note that each pruning can remove a branch in the dendrogram (i.e., remove more than one node), and the pruned node count thereby will be larger than the pruning count.

More specifically, Figure 3.4(b) first compares the running time of SGSelect with dif-ferent pruning strategies. Figure 3.4(c) further analyzes the effectiveness of these pruning strategies by comparing their pruning counts in SGSelect. The distance pruning is the most effective one with the help of the access ordering strategy, i.e., the first feasible solution with a small total social distance returned by access ordering can be exploited to facilitate effective distance pruning. On the other hand, the pruning count of EE pruning exceeds that of IU pruning as p increases. It is because under the same k, the number of edges re-quired inside a size-p feasible group (i.e., p(p−k −1)/2) increases as p grows. Therefore, the exterior expansibility condition is more difficult to hold. EE pruning thereby tends to

Table 3.1: The percentage of prunings located near the root of the dendrogram (i.e., with

|VS| ≤ ⌊^p₂⌋).

Group Size IUP EEP DISP

p=7 0% 44% 61%

p=9 0% 52% 60%

p=11 0% 61% 64%

occur more frequently.

Finally, Figure 3.4(d) compares the pruned node count versus the pruning count with different strategies. The pruning ratio (i.e., the pruning count versus the pruned node count) of distance pruning reaches 1 : 90. When a pruning happens in a position closer to the root of the dendrogram, the number of pruned nodes tends to increase because those pruned nodes are all downstream nodes in the dendrogram. Therefore, we further investigate the position of pruning in different strategies. Table 3.1 shows the percentage of prunings that occur when|VS| ≤ ⌊^p₂⌋, where a smaller |VS| implies that the pruning is closer to the root. The IU pruning usually occurs in the position more distant to the root because the pruning requires that the LHS of Eq. (3.2) exceeds the RHS, and the value of LHS tends to increase as|VS| becomes larger. Nevertheless, the IU pruning still plays an important role in SGSelect because it prunes off the infeasible|VS| and ensures that the final solution satisfies the acquaintance constraint.

Analysis on Temporal Dimension. Recall that Algorithm STGSelect leverages pivot time slots to efficiently explore the temporal dimension in order to find a suitable activity time efficiently. To evaluate the performance on STGQ, we compare STGSelect with the following three algorithms: MultiSGSelect, MultiKNN, and MultiDKS, i.e., sequentially considering each candidate activity period and solving the corresponding SGQ problem using SGSelect, KNN, and DKS, respectively. Figure 3.5(a) first compares the running time of these algorithms under different activity lengths, i.e., m. Note that the running time of MultiDKS is more than 7 hours and thereby not shown in this figure. The results show that STGSelect consistently outperforms MultiSGSelect, especially for a larger m, due to a decreasing number of pivot time slots required to be examined in STGSelect.

(a) Comparison of running time with different m.

(b) Solution quality analysis of KNN.

(d) Comparison of solution quality with different m.

Figure 3.5: Experimental results of STGQ.

Similar to KNN, although MultiKNN is the fastest one, it is not able to guarantee the solution feasibility for the STGQ problem. Figure 3.5(b) shows that the percentage of feasible groups returned by MultiKNN drops quickly as p grows. Note that the distance ratio in Figure 3.5(b) remains above 85%, which is higher than 70% in Figure 3.3(b). It is because when solving STGQ, MultiKNN can only choose the candidates that are available in the activity period, rather than all the candidates in the entire social network. Therefore, the difference of the solution quality between MultiKNN and STGSelect diminishes in this case.

Figure 3.5(c) further presents the running time of STGSelect and MultiSGSelect with different lengths of schedules provided by users. More time slots need to be examined in a longer schedule. The results manifest that STGSelect consistently outperforms Mul-tiSGSelect for varied lengths of schedules. Finally, Figure 3.5(d) analyzes the solution quality with various m. For each p, the total social distance steadily increases as m be-comes larger, because a candidate vertex with a small social distance may not be available in all time slots during the examined period. It is thus necessary to choose other candidates with larger social distances.

(a) Comparison of running time with different p on the YouTube dataset.

(Parameters m and schedule length are for STGSelect.)

(b) Comparison of running time with different k on the YouTube dataset.

(Parameters m and schedule length are for STGSelect.)

(d) Comparison of solution quality with different s.

Figure 3.6: Experimental results with the YouTube dataset.

Analysis with Large Dataset. In the following, we compare different algorithms in a large YouTube dataset. Figure 3.6(a) manifests that the difference of running time becomes even more significant as compared to Figure 3.3(a). When p = 12, SGBasic requires more than 3 days to find the optimal solution, while SGSelect merely needs 5 seconds. When the schedules of users are considered, STGSelect finds the optimal group and a suitable activity time efficiently. In Figure 3.6(b), SGSelect still outperforms SG-Basic by approximately five orders of magnitude. STGSelect, even paying extra effort to consider the user schedules, outperforms SGBasic by more than three orders of magnitude.

Finally, we compare the solution quality in the two different datasets in Figure 3.6(c) and Figure 3.6(d). For fair comparison, we first normalize the edge weights into the range [0,1]. Figure 3.6(c) manifests that the large YouTube dataset leads to better solution qual-ity, and the difference becomes more significant when p increases. The reason is that more proper candidates are inclined to appear in a larger dataset, and hence there is a higher chance of forming a better group. However, as shown in Figure 3.6(d), the solu-tion quality in the smaller Coauthor dataset can be effectively improved when the number

(a) Comparison of coordination time with different network sizes.

(b) Comparison of coordination time with different k.

(d) Comparison of solution quality with different s.

(e) The percentage of qualified results obtained from manual coordination.

(f) The percentage of users that pre-fer SGSelect or manual coordination.

Figure 3.7: Experimental results of the user study.

of candidates grows as s increases.

在文檔中於社群網路中之高效能鏈結預測與群組查詢 (頁 75-82)