• 沒有找到結果。

4.3 Profiles

4.3.1 Connection Features

Connection features are used to capture the users’ interests from the evolving connec-tion network. Tradiconnec-tional link predicconnec-tion methods often extract features by viewing the evolving connection network as a static network in a fixed time interval, i.e., the long-term behavior (interest). However, as mentioned in [23]: “overall behavior of a user may be determined by his/her long-term interest, but at any given time, a user is also affected by his/her short-term interest due to transient events, such as new prod-uct releases and special personal occasions such as birthdays.” So here we don’t only focus the static features that is extracted in a static network as the long-term interest, but also the surprising features which only appears recently and must be extracted by viewing the connection network as an evolving graph as the short-term interest.

Definition 6. (Static features & surprising features): Static features are used to represent the long-term interests which can be extracted from the static network view of the connection network in a fixed time interval. Besides, surprising features are used to represent the short-term interests which only appears recently and should be extracted from the evolving graph view of the connection network.

Usually the connected sites by a user are often used to represent the interest of that user. However, to separate the long-term interests and short-term interests, we use two sliding window to capture the connected sites in two time intervals with different lengths. The larger sliding window with size sl is used to derive the connected sites in the last sl days as the static features of long-term interests, and the smaller sliding window with size ss is used to gain the newly connected sites only appeared in the last ss days as the surprising features of short-term interests. Furthermore, since the number of sites connected by a user is very large, we record the connection count and connection frequency to help us identify the sites which are more interesting to the user. And for the newly connected site, we also record the order that which newly connected is connected earlier than the other by a user.

In summary, 4 different types of connection features are used to describe the long-term and/or short-long-term interest of users from his/her connection behavior:

1. Connected sites or newly connected sites: the sites connected by a user in a fixed time interval or “only” in a fixed time interval.

2. Connection count for each connected site: the number of connections made to a connected site in a fixed time interval.

3. Connection frequency for each connected site: how often a site is connected in a fixed time interval.

4. Connection order for newly connected sites: the order that which newly con-nected site is concon-nected earlier than the other.

We will illustrate why the last three features are selected below.

The features of connection count is inspired from the recommendation systems.

Since our concept is the same as collaborative filtering (CF), we first observe what features are used in CF for recommendation systems. A typical recommendation system consists of users and items along with user feedbacks on items. These user feedbacks are often described in real numbers, for example, using 1 or 0 to represent that a user likes or dislikes an item, or using a score from 0 to 10 to measures how this user is favorite on that item. And CF estimates that two users have similar preferences if they have similar feedbacks on the common accessed items. In connection logs, we don’t have explicit user feedbacks on sites (even the connected sites), but we can estimated them through the connection counts implicitly. Considering that a site is connected by a user means that this user is interested in it, then more connections to the same site directly means that the user is more interested in it.

Connection count for each site can represent the preference of a user in a fixed time interval, but a site with many connections initially and fewer (or no) connections recently is not still interested for a user now. So we seek another feature to help use find the more stable interest for a user in a fixed time interval. Then the regular connected sites attract us since if a user is interested on something for a long time, he may

continuously accesses it, i.e., connects to the site continuously. A regular connected sites is gained by measuring that how often the user connects to it. Hence, we record the connection frequency for each connected site in a fixed time interval to know which site is regularly connected by a user.

Besides, to capture the short-term interest for a user with surprising features, we do not only consider the newly connected sites, but also the connection order of newly connected sites. The authors in [20] has said: “information will flow from earlier adopters to the late adopters, but may not flow back from late adopters to the earlier adopters.” This viewpoint inspire us that two users having the same new interest recently may have the same access sequence on new items, for example, users who bought new digital camera will then buy a tripod and many lens later if they all are interested in photography recently. The order of how user connects to the new sites can reflect his/her new interest directly. So we also use the connection order of newly connected sites as another surprising feature to describe the short-term interest of a user.

Table 4.2: The four features used in a profile.

L/S Features Descriptions

L Connection count the number of connections made to a connected site L Connection frequency how often a site is connected in a fixed time period S Newly connected sites the sites only connected in a recent time period S Connection order the order that each newly connected site is connected

To give a conclusion, we list all the four features we selected in Table 4.2. A feature starts with an L is a static feature for long-term interest derived in a relatively larger sliding window, and a feature starts with an S is a surprising feature for the short-term interest which is derived in a relatively smaller sliding window, respectively. The larger sliding window with size sl is used to capture the static interest for users in the last sl days and the smaller sliding window with size ss is used to capture the surprising interest for users in the last ss days. Both sl and ss are fixed parameters in PLPF.

In the following paragraphs, we will illustrate the implementation of profile format in

PLPF along with its construction/updating algorithm.

相關文件