Experimental results of STGQ - Proof of Theorem 4.5.1

A.1 Proof of Theorem 4.5.1

3.5 Experimental results of STGQ

In this section, we evaluate the performance and analyze the solution quality of the proposed algorithms. First, we describe the experiment setup in Section 3.5.1. We then perform a series of sensitivity tests to study the impact of query parameters with real datasets and evaluate the performance of Algorithms SGSelect and STGSelect in Section 3.5.2. Finally, we invite users to answer activity planning problems and compare the proposed algorithm with manual coordination in Section 3.5.3.

3.5.1 Experiment Setup

To evaluate the performance and analyze the solution quality of the proposed algo-rithms, we conduct various experiments using real datasets. The existing approaches (e.g., [2,27,41]) cannot be applied to solve SGQ and STGQ because their problem formu-lations do not include the temporal dimension and social constraints s and k. Therefore, we compare the proposed algorithms with three other approaches: SGBasic (i.e., enumer-ating all possible candidate groups), KNN (i.e., selecting the p−1 people with the smallest

social distances to the initiator), and DKS [12] (i.e., choosing the candidate group out of all possible ones that maximizes the number of edges within the group). Note that DKS is the core of the algorithms in the aforementioned works [2,27,41] and correlated to [11,39,55].

Since evaluating STGQ requires daily schedules, we invite 194 people from various community to join the experiment and use Google Calendar to collect their schedules. The data collection lasts for 5 weeks and returns 6790 days of real schedules. We then ran-domly select 5 weekday and 2 weekend schedules to form a 7-day one for each vertex in the social network. We perform a series of sensitivity tests on a coauthorship network (called Coauthor) of 16,726 people [38]. In addition to the sensitivity tests, we also conduct experiments on a much larger YouTube social network [61], which includes 1,134,890 people. The algorithms are implemented on an HP DL580 server with four Intel E7-4870 2.4 GHz CPUs and 128 GB RAM.

To demonstrate the strength of automatic group recommendation, we implement a so-cial activity planning application on Facebook and conduct a user study. We invite 171 people from various communities to perform manual activity planning. Each user answers 20 activity planning requests with social graphs extracted from their social networks on Facebook. The social distances between users and their friends are specified by the users, and the social distances between their friends are derived according to the number of com-mon friends [9, 37]. These 20 tasks span various network sizes and different numbers of attendees, and with different s and k for varying social atmospheres. We then compare the solution quality and the processing time of manual coordination with that of automatic group recommendation.

3.5.2 Performance Analysis of SGQ and STGQ

In this section, we first present an analysis on the proposed strategies in SGQ and its extension in temporal dimension (i.e., STGQ). After that, we then evaluate the proposed algorithms in the large YouTube dataset.

Analysis of SGQ. We first compare the running time of SGSelect against SGBasic,

(a) Comparison of running time with different p.

(b) Solution quality analysis of KNN.

(d) Comparison of running time with different k.

Figure 3.3: Experimental results of SGQ.

KNN, and DKS with different numbers of attendees, i.e., p. Figure 3.3(a) presents the experimental results with s = 2 and k = 3. The trends in other parameter settings, such as s = 1, are similar. The results indicate that SGSelect outperforms SGBasic and DKS, and the improvement becomes more significant as p grows because SGBasic and DKS need to carefully examine numerous candidate groups, and the processing effort of each candidate group also increases with p. SGBasic outperforms DKS for a larger p because the constraint with the same k becomes stricter under a larger p. Moreover, SGBasic is likely to detect and then discard an infeasible candidate group in an early stage (i.e., find a candidate group infeasible when checking the first one or two vertices instead of checking all the vertices in the group). However, DKS only focuses on maximizing the density of the candidate group and does not leverage the k constraint. In contrast to the above approaches, SGSelect is able to effectively prune the solution space with the proposed access ordering, distance pruning, and acquaintance pruning strategies.

Although KNN is the fastest one in Figure 3.3(a), Figure 3.3(b) manifests that many solutions returned by KNN are infeasible to SGQ. More specifically, the feasibility ratio shows the percentage of feasible groups returned by KNN. It drops quickly as p grows

because the candidates with small social distances to the initiator do not necessarily know each other. In addition, Figure 3.3(b) compares the total social distances of KNN and SGSelect. The distance ratio represents the total social distance returned from KNN di-vided by the total social distance returned from SGSelect. Note that the solution of KNN can be regarded as a lower bound on the total social distance of SGQ since the social con-straint is relaxed in KNN. Figure 3.3(b) indicates that the ratio remains above 70% even for a large p.

Figure 3.3(c) shows the results with different social radius constraints, i.e., s. As s rises, the number of candidate vertices considered (i.e., friends within s hops) increases quickly, and the running time grows rapidly as a consequence. For example, when s changes from 2 to 3, the running time of SGBasic drastically becomes near 1,000 times greater. However, the running time of SGSelect only increases 11 times. This result in-dicates that the proposed pruning strategies become more and more effective as the num-ber of candidates increases. SGSelect thereby is much more scalable than SGBasic. In addition to the social radius constraint, we also compare the running time of these two approaches under different acquaintance constraints, i.e., k. As shown in Figure 3.3(d), the running time of SGBasic only slightly changes for different k. In contrast, the run-ning time of SGSelect is reduced for a smaller k, since the IU prurun-ning, EE prurun-ning and acquaintance pruning become more effective under a tighter acquaintance constraint. Fi-nally, Figure 3.3(d) manifests that SGSelect consistently outperforms SGBasic by more than one order of magnitude, even under the loosest k.

Detailed Analysis on Proposed Strategies. The above experiment results show that the proposed algorithm requires much less time than the baseline algorithms due to the proposed strategies. In the following, we first investigate the effectiveness of the access ordering strategy, which can guide an efficient exploration of the solution space. Figure 3.4(a) shows that either interior unfamiliarity (IU) or exterior expansibility (EE) can re-duce the running time, and the proposed access ordering strategy that combines both of them leads to the greatest improvement.

Afterwards, Figures 3.4(b), 3.4(c), and 3.4(d) analyze the pruning power of

acquain-(a) Comparison of running time with different access ordering strategies used.

(b) Comparison of running time with different pruning strategies used.

(d) Comparison between the pruning count and the pruned node count. (The postfix represents the group size p.)

Figure 3.4: Analysis on pruning ability of proposed strategies.

tance pruning, distance pruning, interior unfamiliarity condition (IU pruning), and exterior expansibility condition (EE pruning), where a node in Figure 3.4(d) represents a visited state in the branch-and-bound tree. Note that each pruning can remove a branch in the dendrogram (i.e., remove more than one node), and the pruned node count thereby will be larger than the pruning count.

More specifically, Figure 3.4(b) first compares the running time of SGSelect with dif-ferent pruning strategies. Figure 3.4(c) further analyzes the effectiveness of these pruning strategies by comparing their pruning counts in SGSelect. The distance pruning is the most effective one with the help of the access ordering strategy, i.e., the first feasible solution with a small total social distance returned by access ordering can be exploited to facilitate effective distance pruning. On the other hand, the pruning count of EE pruning exceeds that of IU pruning as p increases. It is because under the same k, the number of edges re-quired inside a size-p feasible group (i.e., p(p−k −1)/2) increases as p grows. Therefore, the exterior expansibility condition is more difficult to hold. EE pruning thereby tends to

Table 3.1: The percentage of prunings located near the root of the dendrogram (i.e., with

|VS| ≤ ⌊^p₂⌋).

Group Size IUP EEP DISP

p=7 0% 44% 61%

p=9 0% 52% 60%

p=11 0% 61% 64%

occur more frequently.

Finally, Figure 3.4(d) compares the pruned node count versus the pruning count with different strategies. The pruning ratio (i.e., the pruning count versus the pruned node count) of distance pruning reaches 1 : 90. When a pruning happens in a position closer to the root of the dendrogram, the number of pruned nodes tends to increase because those pruned nodes are all downstream nodes in the dendrogram. Therefore, we further investigate the position of pruning in different strategies. Table 3.1 shows the percentage of prunings that occur when|VS| ≤ ⌊^p₂⌋, where a smaller |VS| implies that the pruning is closer to the root. The IU pruning usually occurs in the position more distant to the root because the pruning requires that the LHS of Eq. (3.2) exceeds the RHS, and the value of LHS tends to increase as|VS| becomes larger. Nevertheless, the IU pruning still plays an important role in SGSelect because it prunes off the infeasible|VS| and ensures that the final solution satisfies the acquaintance constraint.

Analysis on Temporal Dimension. Recall that Algorithm STGSelect leverages pivot time slots to efficiently explore the temporal dimension in order to find a suitable activity time efficiently. To evaluate the performance on STGQ, we compare STGSelect with the following three algorithms: MultiSGSelect, MultiKNN, and MultiDKS, i.e., sequentially considering each candidate activity period and solving the corresponding SGQ problem using SGSelect, KNN, and DKS, respectively. Figure 3.5(a) first compares the running time of these algorithms under different activity lengths, i.e., m. Note that the running time of MultiDKS is more than 7 hours and thereby not shown in this figure. The results show that STGSelect consistently outperforms MultiSGSelect, especially for a larger m, due to a decreasing number of pivot time slots required to be examined in STGSelect.

(a) Comparison of running time with different m.

(b) Solution quality analysis of KNN.

(d) Comparison of solution quality with different m.

Figure 3.5: Experimental results of STGQ.

Similar to KNN, although MultiKNN is the fastest one, it is not able to guarantee the solution feasibility for the STGQ problem. Figure 3.5(b) shows that the percentage of feasible groups returned by MultiKNN drops quickly as p grows. Note that the distance ratio in Figure 3.5(b) remains above 85%, which is higher than 70% in Figure 3.3(b). It is because when solving STGQ, MultiKNN can only choose the candidates that are available in the activity period, rather than all the candidates in the entire social network. Therefore, the difference of the solution quality between MultiKNN and STGSelect diminishes in this case.

Figure 3.5(c) further presents the running time of STGSelect and MultiSGSelect with different lengths of schedules provided by users. More time slots need to be examined in a longer schedule. The results manifest that STGSelect consistently outperforms Mul-tiSGSelect for varied lengths of schedules. Finally, Figure 3.5(d) analyzes the solution quality with various m. For each p, the total social distance steadily increases as m be-comes larger, because a candidate vertex with a small social distance may not be available in all time slots during the examined period. It is thus necessary to choose other candidates with larger social distances.

(a) Comparison of running time with different p on the YouTube dataset.

(Parameters m and schedule length are for STGSelect.)

(b) Comparison of running time with different k on the YouTube dataset.

(Parameters m and schedule length are for STGSelect.)

(d) Comparison of solution quality with different s.

Figure 3.6: Experimental results with the YouTube dataset.

Analysis with Large Dataset. In the following, we compare different algorithms in a large YouTube dataset. Figure 3.6(a) manifests that the difference of running time becomes even more significant as compared to Figure 3.3(a). When p = 12, SGBasic requires more than 3 days to find the optimal solution, while SGSelect merely needs 5 seconds. When the schedules of users are considered, STGSelect finds the optimal group and a suitable activity time efficiently. In Figure 3.6(b), SGSelect still outperforms SG-Basic by approximately five orders of magnitude. STGSelect, even paying extra effort to consider the user schedules, outperforms SGBasic by more than three orders of magnitude.

Finally, we compare the solution quality in the two different datasets in Figure 3.6(c) and Figure 3.6(d). For fair comparison, we first normalize the edge weights into the range [0,1]. Figure 3.6(c) manifests that the large YouTube dataset leads to better solution qual-ity, and the difference becomes more significant when p increases. The reason is that more proper candidates are inclined to appear in a larger dataset, and hence there is a higher chance of forming a better group. However, as shown in Figure 3.6(d), the solu-tion quality in the smaller Coauthor dataset can be effectively improved when the number

(a) Comparison of coordination time with different network sizes.

(b) Comparison of coordination time with different k.

(d) Comparison of solution quality with different s.

(e) The percentage of qualified results obtained from manual coordination.

(f) The percentage of users that pre-fer SGSelect or manual coordination.

Figure 3.7: Experimental results of the user study.

of candidates grows as s increases.

3.5.3 User Study of Manual Activity Coordination

In Figures 3.7(a)-3.7(e), we compare manual coordination and SGSelect for activity planning in the user study. Figure 3.7(a) and Figure 3.7(b) first compare the coordination time with different network sizes and k, respectively. When the network size increases, the number of candidate attendees is inclined to become larger, which implies that the number of candidate groups will increase quickly. Therefore, Figure 3.7(a) manifests that the elapsed time of manual coordination grows quickly with a larger network size. Although the elapsed time of SGSelect also follows the same trend, it constantly requires less than

a few milliseconds. Figure 3.7(b) shows that the elapsed time of manual coordination is more than 60 seconds. Even when k = 6, where constraint k can be ignored in this case and the initiators only need to consider the social distances of the candidates, the manual coordination still requires approximately 40 seconds.

In Figure 3.7(c) and Figure 3.7(d), we compare the solution quality of manual coordi-nation and SGSelect. Note that when the network size or s increases, we are able to find a group with a smaller total social distance, since more candidate attendees appear in V_A. Nevertheless, Figure 3.7(c) manifests that the improvement due to a large network size or s is tiny for manual coordination, because the activity planning problem is difficult for manual coordination even with the network size as small as 10. It is time-consuming and tedious for users to carefully extract a better group when the network size grows. In con-trast, our proposed algorithm always finds the optimal one among all possible groups and outperforms the manual coordination by 67% on average. In Figure 3.7(d), the solution quality of manual coordination slightly improves when s increases from 1 to 2. However, the solution quality becomes slightly worse when s further increases from 2 to 3, which shows that the solution quality is not stable in manual coordination.

In addition to the solution quality, manual coordination suffers from the problem of so-lution feasibility. Given a pair of s and k, manual coordination may not always be able to find a feasible group that satisfies the social constraints even when there actually exists at least one. Figure 3.7(e) shows the percentage for manual coordination to find the answers satisfying the social constraints. Both the percentage of following the social radius con-straint s and the percentage of following the acquaintance concon-straint k decrease quickly as p grows up, manifesting that it is much harder for the initiator to ensure the connectivity of all the attendees when p becomes larger. It is more difficult for manual coordination to ensure constraint k because examining the connectivity of all pairs of attendees is nec-essary to be involved, while constraint s only requires the distance calculation on each attendee and the initiator herself. In contrast, for all activity planning problems, our pro-posed algorithm is guaranteed to find a feasible solution (i.e., the percentage is 100%) and ensure that the total social distance is minimized.

In the user study, when we return the recommended groups obtained by SGSelect to the users, Figure 3.7(f) manifests that 93% of users agree that these solutions are better or as good as the groups derived by themselves. According to the feedbacks from the users, we conclude that an automatic group recommendation system is highly desirable, since it reduces considerable efforts on activity planning. This has a significant impact on increasing users’ willingness to organize social activities with friends.

3.6 Summary

To the best of our knowledge, there is no existing work in the literature that addresses the issues of automatic activity planning based on both the social and temporal relation-ships of an initiator and attendees. In this study, we first define two useful queries, namely, SGQ and STGQ, to obtain the optimal set of attendees and suitable activity time. We show that these problems are NP-hard and inapproximable within any ratio. We then devise two algorithms, SGSelect and STGSelect, to find optimal solutions in reasonable time with ef-fective query processing strategies, including access ordering, distance pruning, acquain-tance pruning, pivot time slots, and availability pruning are explored to prune redundant search space for efficiency. Experimental results indicate that the proposed SGSelect and STGSelect are significantly more efficient and scalable than the baseline approaches. Ac-cording to a user study we performed, the proposed algorithm obtains higher solution quality with much less coordination effort compared with manual activity coordination, and hence increases users’ willingness to organize activities with friends.

Chapter 4 Efficient Processing of Consecutive Group Queries for Social Activity Planning

4.1 Introduction

In Chapter 3, we have introduced the Social Group Query (SGQ) and Social-Temporal Group Query (STGQ) for activity planning. Note that it is difficult for a user to specify all the query parameters right at once to find the perfect group of attendees and time. Fortu-nately, with SGQ and STGQ, it is easy for the user to tune the parameters to find alternative solutions. For example, the initiator may decrease k to tighten the group, or increase s to incorporate more friends of friends. Allowing tuning parameters to try consecutive queries is a great advantage of the planning service over the current practice of manual planning.

However, it has not been explored in related works [2, 6, 18, 24, 27, 41, 43, 55, 59]. Some existing studies [14,34,63,64] on subgraph queries return multiple subgraphs in one single diversified query. However, without feedback and guidance from user-specified param-eters, most returned subgraphs are likely to be redundant (i.e., distant from the desired results of users). In realistic, users usually review the result obtained with the current parameter setting, and then adjust the parameter setting for succeeding queries.

A straightforward method to support a sequence of SGQs is to answer each individ-ual query with Algorithm SGSelect introduced in Chapter 3. However, anticipating that the users would not adjust the parameters drastically, we envisage that exploiting the in-termediate solutions of previous queries may improve processing of succeeding queries.

To facilitate the above idea, in this chapter, we propose Consecutive Social Group Query (CSGQ), which aims to efficiently support a sequence of SGQs with varying parameters, and CSGQ can be extended to support a sequence of STGQs. Accordingly we design a new tree structure, namely, Accumulative Search Tree, which caches the intermediate so-lutions of historical queries in a compact form for reuse. To facilitate efficient lookup, we

在文檔中於社群網路中之高效能鏈結預測與群組查詢 (頁 74-0)