• 沒有找到結果。

To the best of our knowledge, there is no existing work in the literature that addresses the ensemble of drastic sparsification to facilitate link prediction. Removing some edges

before predicting the existence of other edges seems counterintuitive; however, the exper-imental results show that the proposed DEDS framework not only reduces prediction time considerably, but also preserves high accuracy. When the network is drastically sparsi-fied, DEDS effectively relieves the drop in prediction accuracy and raises the AUC value.

With a larger sparsification ratio, DEDS can even outperform the classifier trained from the original network. In terms of efficiency, the prediction cost is substantially reduced after the network is sparsified. If the original network is disk-resident but can fit into main memory after being sparsified, the improvement is even more significant. Moreover, the DEDS framework places no restrictions on the sparsification ratio and ensemble size, and users can decide the ratio and the size flexibly based on their computational ability and accuracy requirements. Note that all of the individual classifiers in DEDS can be trained and run independently, which makes DEDS easier to parallelize.

Chapter 3

Efficient Social-Temporal Group Query with Acquaintance Constraint

3.1 Introduction

In Chapter 2, we have studied the link prediction problem that focuses on predicting the existence of link between particular two users. In this chapter, we extend the scope from the relationship between two users to the relationship among a group of users, and study the social-temporal group query problem with its applications in activity planning.

Three essential criteria for social activity planning are (1) finding a group of attendees fa-miliar with the initiator, (2) ensuring most attendees have tight social relations with each other, and (3) selecting an activity period available to all. For example, a person with some free movie tickets to share may like to find a group of mutually close friends and a time available to all. Nowadays, most activities are still initiated manually via phone, e-mail, or texting. However, with a growing number of systems possessing information needed for activity initiation, new activity planning functions can be provided. For exam-ple, social networking websites, such as Facebook, Google+, and LinkedIn, provide the social relations, and web applications, such as Google Calendar, Doodle1, Timebridge2,

1The Doodle website. http://doodle.com/.

2The Timebridge website. http://www.timebridge.com/.

and Meetup3, allow people to share their available time and activity plans to friends. For manual activity planning, finding socially close participants and a suitable time can be tedious and time-consuming, due to the complexity of social connectivity and the diver-sity of schedules. Thus, there are demands for an effective activity planning service that automatically suggests socially acquainted attendees and a suitable time for an activity.

To support the aforementioned service, we formulate a new query problem, named Social-Temporal Group Query (STGQ), which considers the available time and relation-ships of candidate attendees. Given an activity initiator, we consider her social network for candidate attendees. We assume that their schedules are available to the planning service (e.g., via web collaboration tools), and the closeness between friends is quanti-tatively captured as social distance [3, 10, 65]. Based on the criteria mentioned earlier, an STGQ comes with the following parameters, (1) a group size p to specify the number of attendees, (2) an availability constraint m to specify the length of activity period, (3) a social radius constraint s for the scope of candidate attendees, and (4) an acquaintance constraint k to govern the relationships between attendees. STGQ aims to find a group and time matching the group size in (1) and the activity length in (2), such that the total social distance between the initiator and attendees is minimized. Additionally, the social radius constraint in (3) specifies that all attendees are located no more than s edges away from the initiator, while the acquaintance constraint in (4) requires that each attendee has at most k unacquainted attendees. As such, by controlling s and k based on the desired social atmosphere, suitable attendees and time are returned. Note that, while possessing the required information, all of the aforementioned web applications (e.g., Doodle) still cannot automatically find the suitable attendees and time for the initiators. To the best of our knowledge, the STGQ problem has not been studied before. In the following, Example 3.1.1 presents an illustrative example of SGTQ.

Example 3.1.1. Consider an illustrative example in Figure 3.1, where Casey Affleck would like to invite some friends to dine together. Figure 3.1(a) shows a possible social network of Casey Affleck. Assume that Casey Affleck is trying to find three mutually

ac-3The Meetup website. http://www.meetup.com/.

(a)

Figure 3.1: An illustrative example for STGQ. (a) The sample social network, (b) the dendrogram of candidate group enumeration and (c) schedules of candidate attendees.

quainted friends. Issuing an STGQ with p = 4 and s = 1, which returns{George Clooney (v2), Robert De Niro (v3), Casey Affleck (v7), Michelle Monaghan (v8)}, does not ensure social closeness since the three close friends of Casey Affleck are not acquainted to each other (as shown in Figure 3.1(a)). Instead, by adding the acquaintance constraint and is-suing an STGQ with p = 4, s = 1 and k = 1, a better list of invitees{George Clooney (v2), Robert De Niro (v3), Julia Roberts (v6), Casey Affleck (v7)}, where everyone knows at least two invitees, is obtained. However, Casey Affleck then finds out that these four attendees have no available time in common when he expects the length of activity time as 3, i.e., three consecutive time slots. Figure 3.1(c) shows the schedule of candidates, with their available time slots marked by circles. Therefore, he turns to issue an STGQ with p = 4, s = 1, k = 1 and m = 3, which returns a group of mutually acquainted invitees and a suitable activity period that is available for all invitees, i.e.,{George Clooney (v2), Robert De Niro (v3), Brad Pitt (v4), Casey Affleck (v7)} and [ts2, ts4].

An intuitive approach to find the answer is to enumerate and examine all the possible four-person candidate groups that include Casey Affleck himself for each possible activity period of length 3. In this example, the time interval [ts1, ts6] can be divided into four

can-didate activity periods, [ts1, ts3], [ts2, ts4], [ts3, ts5] and [ts4, ts6]. Since s = 1, all directly connected friends of Casey Affleck, together with Casey Affleck himself, i.e.,{George Clooney, Robert De Niro, Brad Pitt, Julia Roberts, Casey Affleck, Michelle Monaghan}, are candidates. Figure 3.1(b) illustrates the enumeration process of the candidate groups in accordance with the increasing order of user IDs. Note that the numbers 64 and 65 indicate the total social distances of qualified candidate groups. These ten non-duplicate candidate groups constitute the solution space in a certain activity period, from which we eliminate the ones disqualified by the acquaintance constraint or not available in this ac-tivity period. Finally, the group with the smallest total social distance among qualified candidate groups is returned as the optimal solution.

In situations where the activity time is pre-determined (e.g., a tennis game), STGQ can be simplified as a Social Group Query (SGQ). In this chapter, we first examine the pro-cessing strategies for SGQ and then extend our study to the more complex STGQ. Solving an SGQ may incur an exponential time because SGQ is NP-hard, and processing an STGQ is even more challenging due to the diversity of schedules. When sequentially choosing the attendees at each iteration to form a candidate group, giving priority to close friends of the initiator may lead to a smaller total social distance. However, it may not necessarily satisfy the acquaintance constraint. On the other hand, prioritizing a set of mutually close friends to address the acquaintance constraint does not guarantee minimum total social distance. Moreover, in processing an STGQ, a group of mutually acquainted friends with a small total social distance still cannot form a solution if their available times do not over-lap. Therefore, the challenge comes from the strategical dilemma between reducing the total social distance and ensuring that the solution follows the constraints both socially and temporally. Based on the above observations, we propose an algorithm called SGSelect that addresses both the social distance and connectivity, and then extend it to STGSelect by incorporating various strategies for the temporal dimension. Compared with the exist-ing studies that focus on only the social dimension to find densely-connected subgroups (e.g., [6, 18, 43, 55, 59]), STGSelect can process both temporal and social dimensions

ef-fectively and efficiently.

Contributions of this chapter are summarized as follows.

• We formulate two useful queries for social activity planning, namely, SGQ and STGQ, to obtain the optimal set of attendees and a suitable activity time. These queries can be used to plan for various activities by specifying the social radius constraint s and the acquaintance constraint k. We also prove these two problems are NP-hard and inapproximable within any ratio. In other words, there exists no approximation algorithm for SGQ and STGQ unless P = NP.

• We propose Algorithms SGSelect and STGSelect to efficiently find the optimal so-lution to SGQ and STGQ, respectively. Moreover, we devise various strategies, in-cluding access ordering, distance pruning, acquaintance pruning, pivot time slots, and availability pruning, to prune redundant search space and improve efficiency.

Our research results can support social networking websites and web collaboration tools as a value-added service.

• We conduct a user study to compare the proposed planning service with manual coordination, and collect feedbacks as a guidance to enrich our group query service.

The results show that the proposed algorithms are able to obtain higher solution quality with much less coordination effort, increasing users’ willingness to organize activities.

The rest of this chapter is summarized as follows. In Section 3.2, we introduce related works. Section 3.3 formulates SGQ and explains the details of Algorithm SGSelect. Sec-tion 3.4 extends our study on SGQ to the more complex STGQ. The details of Algorithm STGSelect are also included. Finally, we present the experimental results in Section 3.5 and summarize this chapter in Section 3.6.

3.2 Related Works

Though some web applications have been developed to support activity coordination, they require users to manually assign activity time and participants. For example, with the Events function on Facebook, an activity initiator can specify an activity time and select friends to invite. These friends then reply with whether they can attend or not.

Some event planning websites and apps, such as Doodle, Timebridge, SelectTheDate4, and NeedToMeet5, are developed to reduce the initiator’s efforts on collecting the avail-able time of potential participants. However, the initiator still needs to manually choose some possible activity times and the participants to issue the invitations, and social co-hesiveness is thereby not ensured. Such manual activity coordination process is tedious and time-consuming. In contrast, the proposed STGQ, complementary to the above web applications, can automatically find a group of close friends to get together at a suitable activity time.

By minimizing the total social distance among the attendees, we are actually forming a cohesive subgroup in the social network. In the field of social network analysis, re-search on finding various kinds of subgroups, such as clique, k-plex and k-truss has been conducted (e.g., [6, 18, 43, 55, 59]). There are some related works on group formation (e.g., [3, 44, 57]), team formation (e.g., [2, 24, 41]), and group query (e.g., [27, 28, 60]).

There are also some related works on community search and social circle discovering (e.g., [35, 52, 61]). While these works focus on different scenarios and aims, none of them simultaneously encompass the social and temporal objectives to facilitate automatic activity planning. Therefore, the STGQ problem is not addressed previously.