EXTENDING GREEDY METHODS - 標籤社群網絡之影響力最佳化

Since the influence maximization problem is NP-hard [12], current studies focused on devising approximation algorithms to efficiently and effectively find the seed nodes in a greedy manner. In this section, we attempt to extend existing methods, including the general greedy algorithm [12], CELF algorithm [16], new greedy algorithm [3], and degree discount algorithm [3], to the labeled version for solving the new labeled influence maximization in the social network. We will also analyze some properties, complexity and approximation ratio for these extended greedy methods.

3.1 Labeled General Greedy Algorithm

The earliest solution to maximize the spread of influence in a network is proposed by Kempe et al.

[12]. They propose a general greedy strategy to approximately find the seed nodes. In each round t of the general greedy, the algorithm selects one node vt and adds it into the seed set St-1 such that the new set maximizes the influence spread. Specifically, the selection of each seed node vt is based on the maximization of the marginal gain of influence spread, i.e.,

, in which is the expected activated nodes using the seed set S in the simulation of the independent cascade model. Note that the simulations are usually performed up to enough times (i.e., 10,000) and then computes the average activated nodes as the .

We extend the general greedy algorithm to solve the labeled influence maximization problem. We consider the targeted label when estimating the labeled influence spread of certain seed set. And thus the expected number of activated nodes is replaced by . Then the greedy method is the same to select the node which maximizes of the marginal gain of labeled influence spread,

. In addition, based on Kempe et al.’s study [12], we can further show is a submodular function:

where the seed set and . The labeled general greedy algorithm is described in Algorithm 1.

Algorithm 1. Labeled General Greedy (G, k)

1: Initialize and .

2: for to do

for each node

4: .

for

do

6: .

7: .

8: .

9: Return .

Since the labeled influence spread is submodular (i.e., becoming lower as the round t increases), we guarantee the labeled general greedy algorithm can reach 63% approximation according the analytics in [20]. In other words, let be the set of k nodes which has the optimal labeled influence spread, , where is the seed set which is selected using our labeled general greedy

algorithm and .

3.2 Labeled CELF Greedy Algorithm

Followed by the general greedy, Leskovec et al. [16] aim to solve the original influence maximization problem more efficiently. They exploit the submodularity property and propose the Cost-Effective Lazy

Forward (CELF) selection method. Let

, , and thus .

The central idea of CELF is that it is not necessary to recalculate of each node when a new seed node has been added into S (i.e., the recalculation of is performed in the labeled general greedy). Here we extend the CELF greedy to tackle the labeled influence maximization problem.

The new labeled CELF greedy algorithm is shown in Algorithm 2. First, we set of each node to be infinite and put all into a priority queue in a descend order based on . And then we pop out v from the priority queue. If is infinite, we calculate and poll back into the priority queue. Due to the submodularity property of , is diminished returns as the seed set S increases. In round , therefore, a node will be pop out, recalculate the , and poll back.

The recalculated value will not be too low, and thus still locates in the front of the queue. If a certain node is pop out again after one time of recalculation, then is exactly the seed node what we want to select in the current round.

Algorithm 2. Labeled CELF Greedy (G, k)

1: Initialize .

3.3 Labeled New Greedy Algorithm

In addition to the CELF greedy, Chen et al. [3] propose the new greedy algorithm to improve the efficiency of the general greedy as well. They exploit the assumption of the independent cascade model:

each active node u has only one chance to activate one of its inactive neighbor v, and no matter u successfully activate v or not, u will never make an attempt to activate v. Due to all edges has only one chance to be propagated, their method determines which edges will be propagated in prior according to

the probabilities on edges. And thus those nodes reachable from one to another are considered being able to influence each other. Such strategy significantly reduces the execution time spent on the influence simulation. Here we extend the new greedy to solve the labeled influence maximization problem.

Our labeled new greedy algorithm is described in Algorithm 3. First, based on the influence probability on each edge, we determine which edges are selected in prior. Those edges do not be selected are removed from the graph, and a trimmed network is derived. We use to represent the node set which can be successfully activated by the seed set S in . And those nodes reachable from the seed set S in are those ones can be successfully influenced. Using the depth-first search, we can easily derive and . For each , we can obtain the set of nodes which can be activated by node . And the labeled influence spread can be derived by

If , If , In each round, we generate up to times

and select the node with highest average value as the seed node.

Algorithm 3. Labeled New Greedy (G, k)

1: Initialize and .

for

to do

3: Set , for all .

for

do

5: Derive by removing each edge from according to the probability .

6: Compute .

7: Compute , for all .

for each node

if

then .

10: Set , for all .

11: .

12: Return .

3.4 Labeled Degree Discount Algorithm

Other than the new greedy algorithm, Chen et al. [3] further propose the degree discount heuristic to efficiently find the effective seed nodes. Their degree discount assumes that the propagation of influence has lower potentials to spread globally. And thus it is natural to consider only one-step neighbors of nodes and select nodes with high degree values, which tends to has higher expectations of influence, to be the seed ones. The central idea is to compute and update such expectations of influence in each round of selection. If one wants to select k seed nodes, the DegreeDiscount will be performed k times. After selecting one node w as a seed in each round, the DegreeDiscount will recalculate the expectation of influence of each w’s neighbor v because v’s expectation of influence will get discount due to selecting w as the seed. The formula for the recalculation of the expectation of influence is

, where , is the number of neighbors

of v, , is the number of nodes which belongs to v’s neighbors

and are contained by the seed set S. In other words, is the expectation value that v is not only never influenced by existing seed nodes, but also able to activate those do not be selected as the seed nodes yet.

We consider the targeted labels to modify the degree discount heuristic for addressing the labeled influence maximization problem. Algorithm 4 describes the proposed labeled degree discount algorithm.

The central idea of estimating the expectation of influence is two-fold. First, consider not only the probability that v is failed to be activated by existing seed nodes, but also the expected influence profit that v is able to successfully activate those neighbors with targeted labels. Second, the v’s degree is modified to be the number of neighbors with targeted labels, . Regarding whether v possesses the targeted labels or not, the calculation of can be divided into the following

two cases. If , ; if ,

, where is the set of v’s neighbors nodes which have been selected as seed nodes and possess the targeted labels, . is the set of v’s neighbors which have been selected as seed nodes,

. Such two cases can be combined into line 13 in Algorithm 4. Note that

is the set of v’s neighbors, .

Algorithm 4. Labeled Degree Discount (G, k)

1: Initialize .

在文檔中標籤社群網絡之影響力最佳化 (頁 9-12)