• 沒有找到結果。

Using the Edge Deletion Procedure to Remove Infrequent Edges

Chapter 5.  Group-based Knowledge Flow Mining Methods

5.4   GKF Mining Algorithm (without considering duplicate topics)

5.4.5   Using the Edge Deletion Procedure to Remove Infrequent Edges

Based on the results of topological sorting of VN, the edge deletion procedure examines the vertices and determines which incoming edges should be removed from them. It then removes infrequent edges whose weight is no greater than a user-specified threshold, as shown in Fig. 23. The inputs of this procedure are the sorted list L derived by topological sorting and the edge set EN of the GKF graph. The algorithm checks the incoming edges of each vertex in ascending order of their weights, and those whose weights are no greater than a user-specified threshold

η

are candidates for removal. If an edge is removed, it means that the knowledge referencing behavior between two vertices (topics) is infrequent among the group of workers. Remove the edge ex,y from E and EN;   

If  (no  path  ps,y exists  from  the  start  vertex  s  to  vertex  vy  in  GN)  or  (there  exists a vertex vj, vj ∈ Q and no path pj,d exists from vertex vj to the end  vertex d) 

Fig. 23: The edge deletion procedure

However, an infrequent edge should only be deleted from the graph if removing it would not make any vertex unreachable. Let Q be the set of vertices that have been checked in topological order to remove their infrequent incoming edges. For a vertex vy, if one of its

incoming edges is removed and there is no other path from the start vertex to vy, the removed edge should be returned to the edge sets E and EN. In addition, the vertices checked before vy

should be reexamined to ensure that there is a path from a checked vertex vi in Q to the end vertex. If removing an edge violates the above condition, the edge should be returned to the edge sets E and EN.

Because of the characteristics of topological sorting, the edge deletion procedure ensures that 1) any vertex in the graph GN can be reached from the start vertex; and 2) removing an edge of a vertex does not affect any path from the start vertex to the predecessors of the vertex.

In other words, there exists at least one path from each vertex to the end vertex. Moreover, we can obtain several frequent knowledge paths from the GKF graph to help workers learn the group’s knowledge. The following example explains how to remove an edge from the GKF graph.

Example of Removing Infrequent Edges

In Fig. 22, let vertex vE be the examined vertex and let the user-specified threshold be 0.3.

The vertex vE has two incoming edges: eρ,E with weight 0.2 and eD,E with weight 0.4. The edge eρ,E qualifies for removal, because its weight is no greater than 0.3 and removing it would not make any vertex unreachable. Fig. 24 shows the resulting graph, which represents the GKF of the group. The graph is used to visualize the knowledge flows among the frequent topics and model the referencing behavior of the group.

Fig. 24: The final graph GN

of the GKF model

The edge deletion procedure has several properties. We define and prove the associated lemmas below.

Lemma 1: Let v

s be the start vertex in a graph, GN, of a group-based knowledge flow. For any vertex vh in GN, there exists a path Ps,h from vertex vs to vh.

63

Proof: In the edge deletion procedure, removal of an incoming edge from a vertex v

h

depends on the weight of the edge. All vertices in GN are visited in topological order and their incoming edges are examined. For any vertex vh, an incoming edge should be removed if its weight is no greater than a user-specified threshold. However, if removing an edge from vh

also removes the path Ps,h from GN, that edge should be returned to the vertex.

When deleting an incoming edge of a vertex, the edge deletion procedure ensures that 1) there is a path Ps,h from the start vertex vs to vertex vh; and 2) removing an incoming edge from a successor of vh does not affect the path Ps,h . The proof is as follows. Let a vertex vk be a succeeding vertex of vh in the topological order. Based on the topological order, the edge deletion procedure processes the vertex vh before vertex vk and there exists a path Ps,h. Assume that a path Ps,h does not exist from vs to vh, because an incoming edge of vk has been deleted. Thus, a path must have existed from vertex vs through vk to vh before the edge was deleted. Consequently, vk must be a predecessor of vh. However, this statement contradicts the algorithm’s processing of vertices in topological order. That is, vk is a succeeding vertex of vh

and the path Ps,h exists in GN. Thus, removing an incoming edge from a succeeding vertex of

v

h does not affect the path Ps,h. According to the algorithm and the above explanation, for any vertex vh in GN, there exists a path Ps,h from vertex vs to vh.

Lemma 2: Let v

d be an end vertex in the graph of the group-based knowledge flow GN. For any vertex vh in GN, there exists a path Ph,d from vertex vh to vd.

Proof: Let vertex v

k be the succeeding vertex of the vertex vh. Removing an incoming edge of vertex vk will affect the reachability of the end vertex vd from vertex vh. When the edge deletion procedure removes an incoming edge of vertex vk, it has to check whether the path

P

h,d from vertex vh to the end vertex vd exists. If it does not exist, the incoming edge should not be removed. Therefore, the procedure ensures that a path Ph,d exists from vertex vh to the end vertex vd.

Lemma 3: Let G

N = {VN, EN} be the directed graph of a group-based knowledge flow. All vertices in VN can be visited by traversing vertices from the start vertex vs to the end vertex vd. Then, for any vertex vh in V, there exists a path from vs to vd

through v

h.

Proof: According to Lemma 2 and Lemma 3, for any vertex v

h in VN, there exists a path Ps,h

from the start vertex vs to vh and a path Pv,d from vh to end vertex vd. Therefore, there exists a path from vs to vd through vh.

Lemma 4: For any infrequent edge e

h,k on an infrequent path of GN, either the path from the start vertex vs to vertex vk or the path from the vertex vh to the end vertex vd

must pass through

the edge eh,k.

Proof: Let vertex v

h be a predecessor of vertex vk in the topological order, and let eh,k be an infrequent edge from vertex vh to vertex vk in GN. Assume that there exist two paths, one from start vertex vs to vertex vk

and the other from vertex v

h to the end vertex vd, neither of which passes through the edge eh,k. Our algorithm removes any infrequent edge if doing so will not make any vertex unreachable. Thus, the algorithm will remove the edge eh,k. However, this contradicts the statement that eh,k exists in GN. Consequently, for any infrequent edge eh,k of an infrequent path of GN, either the path from the start vertex vs

to vertex v

k

or the path from the

vertex vh to the end vertex vd must pass through the edge eh,k.

The vertex VGS in graph GN represents a corresponding strongly connected component GS

in G. All vertices in GS with parallel relations or sequential relations are reachable. Lemmas 2, 3, 4 and 5 also hold for G.