APPLICATIONS OF INTERACTION MATRICES - 從社群網路用戶使用記錄探勘階層互動關係及其應用

Nowadays on social networks, a connection may exist between two users with two possible types: friend, and follower. The friend relationship is symmetric since it is bidirectional in commonly seen social networks like Facebook. Thus, on a matrix for representing friend relationship, if two users i and j are friends, the elements for both i to j and j to i are 1; otherwise, they are 0. Since the friend relationship is bidirectional, a social network of friend relationships can be easily shown by an undirected graph.

While the follower relationship is usually asymmetric because it could be either unidirectional or bidirectional on Facebook or Twitter. It is so common that each user has thousands of friends or followers on Facebook or Twitter. In this section, two application algorithms, which are multiplex networks and an information diffusion strategy are proposed. In the first application, we merge the interaction matrix output from the last algorithm, which can represent the closeness between nodes and the adjacency graph merged by the friend and the follower lists of users on the social network. As an output, each interaction level can be considered as one of the classes in the friend or follower relationships on Facebook. For example, High interaction reveals the closest relationships of the connections in the social network, such as Close Friend or See First and Normal interaction is respected as the Default when the two users seem to be common friends. If an interaction level is Low interaction, we consider the connection to be Unfollow or Unfriend.

In the second application, from the multiplex graph, an information diffusion strategy can be proposed based on a cascading model. As preprocessing, instead of a

２８

random choice, the seeds will be chosen by ranking the weighted sum of in-degree centrality in different layer and the formula of weighted sum is defined as formula (6)

CD(i) = wH* indegree(iH) + wN* indegree(iN), (6)

where indegree(iH/N) is the in-degree centrality of i in the high/normal effect layer and wH/N is the weight for the in-degree centralities of the high/normal effect layer. The weighted sum of nodes ranked within 10% of all nodes will be chosen as seeds then used as an input of a diffusion strategy. In the proposed strategy, each connection (i, j) will be attached by an infection rate λ(i, j), representing the rate that i could be successfully infected by j. The infection rate is calculated depending on the layer it belongs and the interaction rate I(i, j), and the formulas are defined as formula (7) and (8), respectively.

λ(i, j) = λH* I(i, j) (7)

λ(i, j) = λN* I(i, j) (8)

where λH and λN are the default infection rate in high effect and normal effect layer, respectively. The information will be diffused from the seeds, which are the first active nodes. Each inactive node will be turned to active state once an active node successfully infects it; and an active node is unable to infect the same inactive node twice. The strategy will stop until there is no inactive node that can be infected.

In the statement of the following section, we take some graphs as inputs and output a graph as well. However, all graphs in the proposed methods are processed as matrices in real implementations.

２９

4.1 Multiplex Networks

In this algorithm, we transform the interaction matrix output from the last algorithm into a multiplex network which can represent the closeness between nodes through the structure of the network then modify the adjacency graph merged by the friend and the follower lists of users in the social network.

INPUT: The friend graph F(V, E1), the follower graph L(V, E2) and the interaction matrix I in a social network.

OUTPUT: An multiplex graph AM(V, E4, C)

STEP 1: Merge the friend graph F(V, E1) and the follower graph L(V, E2) into the adjacency graph A(V, E3) by representing E1 as bidirectional edges and setting E3 = E1∪E2

STEP 2: Set EH, EN and EL as empty sets which are the edge sets of the social network in respect to their label representing the interaction matrix I.

STEP 3: Initially set i = 1 and j = 2, where i and j are used to index the current element to be processed.

STEP 4: Compare the element A(i, j) in the adjacency matrix A and the element I(i, j) in the interaction matrix I with the following cases.

Case 1: If A(i, j) = 1 and

(a) I(i, j) is labeled as high interaction, add the edge e(i, j) to the High effect layer (represented by red) by EH = EH∪e(i, j).

(b) I(i, j) is labeled as normal interaction, add the edge e(i, j) to the normal effect layer (represented by blue) by EN = EN∪e(i, j).

３０

layer (represented by black) by EL = EL∪e(i, j).

Case 2: If A(i, j) = 0 and

(a) I(i, j) is labeled as high interaction, add a new edge ne(i, j) to High effect layer by EH = EH∪ne(i, j).

(b) I(i, j) is labeled as normal interaction, add a new edge ne(i, j) to Normal layer by EN = EN∪ne(i, j).

STEP 5: Set j = j + 1. If j = n + 1, then set j = 1 and i = i + 1.

STEP 6: If i = n +1, then do the next step, otherwise, go to step 4.

STEP 7: Set a universal edge set E4 = EH∪EN∪EL∪E3

STEP 8: Output the multiplex graph AM(V, E4, C) as the result, where C = {Red, Blue, Black} is the color set.

After Step 9, the multiplex graph of a social network is found and output as an application.

Below, an example to explain the proposed approach is given.

３１

4.2 An Example

An example is given below to illustrate the proposed algorithm in Section 4.1.

Follow the assumptions in section 3.3, and also assume that the friend relationship for the social network is shown in Table 11, in which each element reveals a friend relationship between two users. For the example in Table 11, the element for user A to user D is 1 and vice versa, representing that A and D are mutual friends. On the contrary, the element for A to B is 0 and vice versa, representing that A and B are not friends.

Note that the matrix is symmetric.

Table 11: A symmetric matrix representing friend relationship.

The graph representation for Table 11 is displayed in Figure 5. Each connection in the figure reveals a mutual friend relationship. For example, A and D are connected by an edge, representing A and D are friends mutually.

３２

Figure 5: The graph F(V, E1) representation of the friend relationship in Table 5.

Table 6 reveals an example of the follower relationship, in which each element represents a follower relationship for a user i following another user j. For instance, the element for C to B is 1, representing C is a follower of B. On the contrary, the element for B to C is 0, representing B is not a follower of C.

Table 12: An example of follower relationship represented as a matrix.

３３

Figure 6 shows the graph representing the follower relationship in Table 12, in which each dotted link reveals a directed follower relationship. For instance, A directly connects to E, representing A is a follower of E.

Figure 6: The graph L(V, E2) representation of the follower relationship in Table 2.

The interaction matrix is the result we have gotten from the algorithms in chapter 3. In this section, we take the interaction matrix shown in Table 7 in section 3.3 as an input of the multiplex network in this section.

３４

With the input matrices, the proposed method can work step by step as follows Step 1: Set EH, EN and EL as empty sets which are the edge sets of the social network in respect to their label representing the interaction matrix I.

Step 2: The friend graph F(V, E1) and the follower graph L(V, E2) are merged into the adjacency graph A(V, E3) by setting E3 = E1∪E2. For the example, the friend graph in Figure 5 and the follower graph in Figure 6 are merged into the adjacency graph shown in Figure 3.

Figure 7: The merged adjacency graph of the example.

In Figure 7, a solid edge denotes the friend relationship and a dotted edge represents the follower relationship. No matter an edge is solid or dotted, it represents there is an adjacency relation from the start node of the edge to the end of it.

Note that the real implementation of the above concept can be by matrices. The matrix for the friend relationship and the one for the follower relationship can be directly merged by a union operation to become the result of the adjacency matrix. For example, the matrices in Tables 11 and 12 are merged into the adjacency matrix in Table

３５

5, which just corresponds to Figure 7. Each element A(i, j) in the adjacency matrix represents a connection from user i to user j in either the friend or the follower (not excluding both) relationship. Since both the follower and the friend relationships are revealed in Table 13, the adjacency matrix for this example is asymmetric.

Table 13: The adjacency matrix of the social network for this example.

Step 3: The index variable i is set at 1 and the other j is set at 2 in the beginning.

Step 4: Because of different value in adjacency matrix A, there are two cases as the following statements.

Case 1: Since the connection A(1, 2) = 1 in the adjacency matrix and I(1, 2) is labeled as high interaction, we add an edge e(A, B) to the High effect layer.

In case 2, when i = 4 and j = 6, since the connection A(4, 6) = 0 in the adjacency matrix while I(4, 6) is labeled as high interaction, we add an edge ne(A, I) to the High effect layer by marking it as a new edge.

３６

Step 5: The value of j is incremented as 3. Since it is not equal to n + 1, which is 11, thus i still remains 1.

Step 6: Since i is not equal to n + 1, which is 11, thus, Step 2 is done again for i = 2 and j = 3. Step 4 to 7 are then repeated until i = n + 1, which is 11 in this example.

Then Step 7 is executed.

Step 7: Set a universal edge set E4 to be the union of all edge sets in the above steps.

Step 8: The multiplex graph is output. In this example, the results are shown in Figure 8, 9 and 10.

In Figure 8, 9 and 10, the different colors {Red, Blue} reveal various level of the relationships {High effect, Normal effect}, respectively, in the social network. Thus we define an edge-colored multigraph as a triple AM = (V, E4, C) as shown in Figure 5 and 6, where V and E4 are the node set and the edge set respectively, and C = {Red, Blue}

is the color set which each element reveals the color of a layer. The dash edges are the new edges after comparing the elements in the interaction matrix with the adjacency graph.

Figure 8: The High effect layer of the multiplex network in this example

３７

Figure 9: The Normal effect layer of the multiplex network

Figure 10: The Low effect layer of the multiplex network

３８

4.3 An Information Diffusion Strategy in a Multiplex Network

In this section, we propose an algorithm in information diffusion to find the diffusion strategy in a cascading model based on the interaction matrices and the multiplex network. The two interaction matrices could ensure that all the users in the Red layer are over a certain using frequency in the social network thus make a higher infection rate. Before the algorithm is introduced, the notation used in this section is first listed below

n: the number of users;

R: the ranked list of the closeness degree centrality of users;

wH: the weight of the high effect layer;

wN: the weight of the normal effect layer;

indegree(iH): the in-degree centrality of i in the high effect layer;

indegree(iN): the in-degree centrality of i in the normal effect layer;

λH: the default infection rate in the high effect layer;

λN: the default infection rate in the normal effect layer;

λ(i, j): the infection. i is an inactive node that an active node j will infect;

nA: the number of nodes in Active;

Active: the set of nodes in active state;

Active(i): the ith node of Active;

３９

4.3.1 Preprocessing

Before an information diffusion strategy executes, it is necessary to find the node set of seeds as an input of the strategy.

INPUT: The multiplex network AM(V, E4, C) of a social network and two weights wH

and wN.

OUTPUT: A set of nodes S to be seeds in the cascading model.

STEP 1: Initially set i = 1 and R a zero vector, where i are used to index the current element to be processed.

STEP 2: Calculate the closeness degree centrality of the node i, denoted as CD(i), by the formula (6)

STEP 3: If i = n +1, then do the next step, otherwise, set i = i +1 then go to step 2.

STEP 4: For i = 1 to n, sort the closeness degree centrality in descending orders, and save them in S.

STEP 5: Choose the nodes with the closeness degree centrality ranked within 10% of all nodes.

STEP 6: Output the node set of seeds S as a result.

４０

4.3.2 An example

An example is given below to illustrate the proposed algorithm in Section 4.3.1.

Following the assumptions in section 3.3, the results in the preceding section (4.2) are taken as the inputs of the preprocessing algorithm to obtain the chosen seeds. The seeds, which are source nodes, are chosen through a preprocessing of the multiplex graph in Figure 4 and 5. In addition to the multiplex graph, two weights are also applied in preprocessing. With these inputs, preprocessing can work step-by-step as follows:

Step 1: The index variable i is set at 1 and the ranked list R is set as a zero vector in the beginning

Step 2: The closeness in-degree centrality CD(i) is calculated. Thus, for user 1, there are two-layer weights applied in respect to two layers that denoted as wH and wN. From formula (6). Thus, for user 1, the closeness in-degree centrality CD(1) is

Step 4: For i = 1 to n, sort the closeness in-degree centrality CD(i) in descending order and save the result in ranked list R (Table 14).

Table 14: The ranked list of the closeness in-degree centrality

４１

Step 5: From Table 14, choose the nodes with the closeness degree centrality ranked within 10% of all nodes, which is 1 in the example.

Step 6: The node set of seeds S is the output. In this example, {B} is chosen as a seed.

４２

4.3.3 Information Diffusion Strategy

In this section, an information strategy is proposed based on all the result we have gained so far in this thesis. Based on cascading model, the assumptions are given: Each inactive node will be turned to active state once it’s successfully infected by an active node, and an active node cannot infect the same inactive node twice. The strategy will stop until there is no inactive node that can be infected.

INPUT: The multiplex network AM(V, E4, C) of a social network, the node set of seeds S, the infection rate λH and λN, the interaction matrix I of a social network.

OUTPUT: An infection matrix λ where the infection rate λ(i, j) for every (i, j) in a social network.

STEP 1: Initially set i = 1 and j = 1, where i and j are used to index the current element to be processed.

STEP 2: Turn the nodes in S into active state and set Active = S and nA = |S|.

STEP 3: If i is in active state then go to step 5, else the infection rate is calculated in different formulas in following cases:

Case 1: If the connection (i, Active (j)) exists and

(a) (i, Active(j)) is in high effect layer, from formula (7), λ(i, Active (j)) = λH* I(i, Active (j)).

(b) (i, Active(j)) is in normal effect layer, from formula (8) λ(i, Active (j)) = λN* I(i, Active (j)).

Case 2: If the connection (i, Active(j)) does not exist in either high effect or normal effect layer, go to step 5 until an active neighbor is found.

STEP 4: If node i is infected successfully, set node i active by adding node i to Active and nA = nA +1 then go to step 5.

４３

STEP 5: Set i = i + 1. If i = n + 1, set i = 1 then go to the next step, otherwise, go to step 3.

STEP 6: Set j = j + 1. If j = nA +1, do the next step, otherwise, go to step 3.

STEP 7: Output the infection rate matrix λ as a result.

４４

4.3.4 An example

An example is given below to illustrate the proposed algorithm in Section 4.3.3.

With the multiplex network in Figure 8 and 9, the node set of seeds S in section 4.3.2 and the interaction matrix from (Table 7), the proposed method can work step-by-step as follows:

Step 1: The index variable i is set at 1 and the other j is set at 2 in the beginning.

Step 2: Let Active = S to collect active nodes and nA be the number of seeds, which is 1 in the example. Then turn each node in S, which is B in the example, to active state (represented as the color white) (Figure 11 and 12).

Figure 11: The diffusion strategy in high effect layer after step 2 is executed.

Figure 12: The diffusion strategy in normal effect layer after step 2 is executed.

Step 3: If user 1 (A) is inactive then the formulas of the following cases calculate

４５

the infection rate; otherwise, step 5 will be executed. Thus, for the infection rate λ(1, 2) of node Active(1), which is node B (2) in the example, to user 1 (A) is, from the algorithm,

Case 1: Since the connection (A, B) exists in the high effect layer. Thus, for the infection rate λ(1, 2) is 0.7 (= 0.7 * 1), otherwise, step 5 will be executed.

Step 4: Assume that A is successfully infected by B. A is added to Active and the value of nA is incremented as 2. Thus, Active is updated as {A, B}.

Step 5: The value of i is incremented as 2. Since it is not equal to n + 1, which is 11; thus, step 3 to 4 is repeated until i = n + 1, which is 11 in this example; then step 6 is executed.

Assume that, after the spreading of the node B, the inactive nodes A, C, D, E, and H are infected successfully by B. Thus A, C, D, E, and H will be added to Active. Active is updated as {A, B, C, D, E, H}, which are shown in orange in Figure 13 and 14.

Step 6: The value of j is incremented as 2. Since it is not equal to nA + 1, which is 7; thus, step 3 to 5 is repeated; otherwise, step 7 is executed.

Figure 13: The diffusion strategy in high effect layer after the active node B spread.

４６

Figure 14: The diffusion strategy in normal effect layer after the active node B spread.

Step 7: The infection rate matrix λ is output as a result.

４７

CHAPTER 5

EXPERIMENTAL RESULTS

In this chapter, the experiments conducted for the proposed algorithm are described and discussed. The experiments were implemented in MATLAB on a personal computer with an Intel Core i5 4570 with 3.20 GHz and 8GB RAM. In the experiment, the random data is used to discuss the thresholds that are applied in the proposed methods. In the data, the usage records are recorded from random numbers while making them similar to real data, e.g. clicking like is the most frequently used action for common users. From different numbers of users, the required setting of the thresholds can be seen in the experiment in this section.

４８

5.1 The Thresholds in Finding Interaction Matrix

In this section, experiments were made for showing the influence on the number of users to α, which is the threshold for the algorithm proposed in section 3.2. The dataset used here were produced randomly with the number of users from 10–10000.

The influence of the user number on α = 0.15 and β = 0.03, which is applied in the example in section 3.3, was first measured. The experiment result for the number of connections labeled as high interaction and low interaction and the number of users with α = 0.15 and β = 0.03 (Figure 15 and 16).

Figure 15: The amount of high interaction relationship with different number of users when α = 0.15.

４９

Figure 16: The amount of low interaction relationship with different number of users when β = 0.03.

Figure 15 shows that the amount of connections labeled as high interaction will decrease rapidly when the number of users increases. Figure 16 shows that with the number of users increasing, the amount of connections labeled as low interaction will increase rapidly. Thus, α is modified to a lower value to ensure the high interaction label can be recognized. For this viewpoint, Figure 17 and 18 show the different value of α with the amount of connections in high interaction and number of users; and α = 0.0019 and α = 0.0017 is applied when the number of users is 1000.

５０

Figure 17: The amount of high interaction relationship with different number of users when α = 0.0019.

Figure 18: The amount of high interaction relationship with different number of users when α = 0.0017.

Figure 16 and 17 show the amount of connections is around 120000–160000, i.e., each user has the number of high interaction relationships averaging from 120–160.

The way of choosing α is decided based on the Dunbar’s number. Dunbar’s number is a confidence interval from 100–230 with 95% confidence. Thus, we defined the value

５１

of α in terms of the amount of high interaction relationships.

Figure 19: The amount of low interaction relationship with different number of users when β = 0.0011.

Figure 19 shows that the amount of low interaction relationships will increase with the number of users. However, on a social network such as Facebook, the limitation of the number of friends a user has is 5000. That is, when the number of users gets larger, there will be more strangers on a social network, which makes a large amount of low interaction relationships.

５２

Figure 20: The amount of low interaction relationship with different number of users

在文檔中從社群網路用戶使用記錄探勘階層互動關係及其應用 (頁 34-60)