Applying the GKF Mining Algorithm for Dealing with Topic loops

Chapter 5. Group-based Knowledge Flow Mining Methods

5.5 The GKF Mining Algorithm for Dealing with Topic Loops

5.5.1 Applying the GKF Mining Algorithm for Dealing with Topic loops

The following example considers a group of four workers with similar KFs. Their topic-level KFs (TKFs) are listed in Table 5. Each element in a TKF is used to represent a topic domain. Thus, the elements in a TKF are arranged as a topic sequence based on the times they were referenced. As a topic may appear more than once in a specific KF, because the worker needs the knowledge at different times, we apply the GKF-TL mining algorithm to deal with topic loops.

Table 5: The TKFs of four workers

Worker Topic-level KF (TKF) TKF’

John <A, B, A, C, D, F> <A1, B1, A2, C, D, F>

Mary <B, A, B, C, D> <B1, A1, B2, C, D>

Lisa <B, A, D, F> <B1, A1, D, F>

Tom <A, B, A, E, G, D> <A1, B1, A2, E, G, D>

In Table 5, a topic that appears more than once in a TKF is labeled as a different instance of the topic, and a TKF with duplicate topics is transformed into a TKF’. Then, the algorithm uses TKF’ to build the initial graph of the GKF model. In this example, we set the user-specified thresholds for topic relation identification and edge deletion as

ε

= 1 and

θ

0.3 respectively. The initial graph derived before graph transformation is shown in Fig. 25. A strongly connected component is discovered in the initial graph. To resolve the vertex relation problem in the strongly connected component, the algorithm applies the topic relation identification procedure detailed in Fig. 18. The vertex relation in the strongly connected component is shown in Gs in Fig. 25. The number on each edge represents the edge’s weight.

Recall that the weight is derived by Eq. (20) to indicate the importance of the edge.

Fig. 25: The initial graph of the GKF model with topic loops

Fig. 26 shows the result of removing the infrequent edges from the graph in Fig. 25. The sub-graph Gs

in the initial graph is transformed into a vertex v

Gs.; and the edge that connects a vertex in Gs with another vertex, i.e., eρ,D, is removed because its weight is no greater than 0.3.

Fig. 26: The graph of the GKF model with topic loops

Finally, the algorithm merges vertices that are different instances of the same topic into one vertex. For example, in Fig. 25, vertices vB1 and vB2 are different instances of the same topic, so they are merged to form the vertex vB. Moreover, the edge eρ,B2 is replaced by an edge connecting vρ

to v

γ; and the edge eB2,C is changed to edge eρ,c. The vertices vA1 and vA2

are two instances of topic A; hence they are merged to form vertex vA, and their edges are changed accordingly. Fig. 27 shows the final GKF graph, which considers the duplicate topics in each worker’s TKF. To illustrate all knowledge paths in the graph, the vertex vGs is converted into the original graph Gs.

s A2 C 0.5 D F d

Start Vertex

End Vertex 0.5

0.25

E G

0.25

0.25 0.25

0.25

0.5 0.5 0.25

γ 1 ρ 0.5

1 1

Fig. 27: The final GKF graph, which considers the duplicate topics in each worker’s TKF

5.6 Identifying Knowledge Referencing Paths in a GKF Graph

We have developed a method for identifying frequent knowledge paths from the GKF graph to describe the information needs of a group of workers, i.e. their knowledge referencing behavior. A knowledge path, which represents the knowledge referencing behavior of a group of workers, consists of several vertices and edges that can be traversed from the start vertex to the end vertex. To identify a frequent knowledge path, a path score derived from the weights of the edges on a path is used to evaluate each path and indicate its importance, as defined in Eq. (21).

}

{ _x_,_y _x_,_y _i

Min we e path

ps

= ∀ ∈ , (21)

where psi is the path score of the path i; and wex,y is the weight of edge ex,y, which belongs to the path i and represents a direct flow relation between vertex x and vertex y.

Based the weights of all the edges on a specific path, a path score is derived from the minimal weight among the edges to indicate the path’s level of importance. Note that the edge weight derived by Eq. (20) denotes the importance of the direct flow in a GKF. A large edge weight means that the referencing flow between topics is highly significant for the group of workers.

Paths with scores higher than a user-specified threshold are regarded as frequent knowledge paths in the GKF and are selected for the group. Specifically, such knowledge paths (patterns) are used to represent the frequent knowledge referencing behavior of workers and important knowledge flows. The discovered paths will be important references for workers, while the frequent knowledge paths also will help novices learn group-related knowledge. The following example illustrates the computation of the path score.

5.7 The Prototype System for Mining Group-based Knowledge Flows

In this Chapter, we develop a prototype system to demonstrate the proposed methods for

mining group-based knowledge flows (GKFs), which are generally difficult to formalize. To address the problem, our system provides a mining function and modules to identify GKFs easily and effectively. In addition, a GKF is modeled as a graph to represent the referenced topics, the directions of knowledge flows, and the knowledge referencing paths (patterns) for a group of workers with similar KFs. The referencing paths with scores higher than a user-specified threshold are identified to represent the frequent knowledge referencing patterns of the group. We describe the real-world dataset used in our system in Section 5.7.1, present the implementation of our prototype system in Section 5.7.2 and discuss the contributions of this work in Section 5.7.3.

5.7.1 Dataset

We use a dataset from a research laboratory in a research institute. It contains information about 14 knowledge workers, 424 research documents, and a usage log that records the times documents were accessed and the workers’ document preferences. Each worker may perform a number of tasks, e.g., conducting a research project and writing research papers, and the research documents are the codified knowledge needed to perform the tasks. Because a worker’ information needs may change over time, the access time of documents can be used to track changes in his/her information needs for a specific task, and his/her knowledge referencing behavior can be identified.

5.7.2 System Implementation

To implement our prototype system for group-based KF mining, we use Microsoft Visual Studio 2005 (with C#) to develop the system and Microsoft SQL Server 2005 as the database system to storing the dataset. Because the dataset contains workers’ logs, it should be preprocessed to generate each worker’s codified-level KF and topic-level KF. To obtain the KF, documents in the dataset are grouped into eight clusters by using a single-link clustering method. Based on the clustering results, a topic-level KF is generated by mapping the codified knowledge into its corresponding clusters for each knowledge worker. Then, the two types of KF, the topic-level KF and the codified-level KF, are derived to describe the information needs of a worker. We use such KFs to build a prototype system to demonstrate the method for mining the knowledge flows of a group of workers.

Our system has two major functions: worker clustering and group-based knowledge flow mining. The former identifies a group’s knowledge flow, and the latter uses a directed acyclic graph to present the mining results. An interface that can visualize the KF is necessary. Note that our system can be applied in any knowledge intensive organization to help workers obtain and learn knowledge. Next, we describe the system in detail.

Fig. 28: The main frame of the KF mining system

The knowledge flow mining system is comprised of three modules: the main module, the CLIQUE clustering module and the GKF model. Each module has functions to help the user (a manager/worker) build a knowledge flow easily. Fig. 28 shows the main frame of the system, which provides essential functions for building the GKF model, e.g., the system settings, the KF alignment similarity and clustering functions. The system setting is used to initialize the system environment, e.g., database selection. The KF similarity function calculates the similarity between two workers’ knowledge preferences based on their knowledge flows and creates a similarity matrix of the workers. The parameter alpha adjusts the relative importance of the KF alignment similarity and the aggregated profile similarity on a scale of 0 to 1, as shown in Eq. (8). The user can specify the value of alpha and use the KF similarity function to create a KF similarity matrix based on the specified value. Then, the CLIQUE clustering method uses the similarity matrix to cluster workers who have similar KFs. The system also provides an interface to show the topic-level KFs of all workers and the results of worker clustering. To simplify the presentation of the KFs, we use a number to represent a topic domain that consists of topic-related terms.

Fig. 29: The CLIQUE clustering module

Fig. 29 shows the CLIQUE clustering module. Before using the module, we have to set two parameters: the number of rows in the KF similarity matrix and the clustering threshold.

The number of rows is used to determine the number of times clustering is performed using the CLIQUE clustering method, while the threshold is used to cluster workers whose similarity scores are higher than a certain value. Then, the clustering result is displayed on the system interface. For example, to perform clustering, the value of alpha is set at 0.3, the number of rows of the KF similarity matrix is 14 and the similarity threshold is set at 0.4.

Each group is comprised of several workers, and each worker belongs to several task-based groups based on the KF similarities. After clustering similar workers, the system stores the clustering results in the database for further utilization and analysis.

Next, using the proposed algorithm, the system builds a group-based knowledge flow (GKF) for a group of workers, as shown in Fig. 30. All the workers in a cluster have similar KFs, which are used to generate a GKF graph to characterize the referencing behavior of the group. In the graph, each circle is a topic domain represented by a number, while each directed edge indicates the flow of knowledge between two topics. The topic domain contains a topic profile, which consists of several representative terms and their term weights. Fig. 30 shows the profile of topic domain 53 in a small window. The listed terms represent the knowledge of the topic.

Fig. 30: The GKF graph and knowledge referencing paths for a specific group

In addition, the number on an arrow indicates the importance of a flow relation in this group’s topics. From the GKF graph, we observe that 6 topics, i.e., 4, 17, 19, 21, 27, and 29, can be referenced in parallel. That is, there is no specific order among the topics accessed by this group of workers. Moreover, the task-related knowledge may flow through 2 paths from the start vertex to the end vertex. In Fig. 30, the listed paths, which consist of several relevant topics and directed edges, are the knowledge referencing paths of this group. The paths with scores larger than a user-specified threshold are frequent referencing behavior patterns. The paths can be regarded as knowledge references for workers to share needed task knowledge.

5.7.3 Discussion

GKF mining by task-based groups has several advantages in a knowledge intensive organization. A GKF represents the flow and delivery of knowledge when workers in the same group perform a task. It can be used to identify topics of interest, major referencing behavior patterns, and the long-term evolution of the group’s information needs; and it allows task knowledge to be circulated and delivered efficiently among workers. If a novice joins the group, the GKF can provide a reference for learning group-based knowledge. The frequent knowledge paths in a GKF help a worker learn task-related knowledge, overcome obstacles encountered in a new domain, and enhance his/her learning efficiency. Moreover, based on the GKF, a manager can determine who has task-related knowledge and who satisfies a task’s

requirements, and then assign appropriate workers accordingly. In addition, through the GKF, an organization can realize the frequent referencing behavior and the information needs of a group of workers, and actively provide knowledge support for them. The GKF can also enhance organizational learning, as well as facilitate knowledge sharing and reuse in the context of collaboration and teamwork.

In this work, we propose a recommendation framework based on the discovered knowledge flow for each knowledge worker, as described in Chapter 4. Such method analyzes workers’ referencing behavior and provides task-related documents to fulfill workers’ tasks.

Because teamwork in an organization is common, we also develop a group-based knowledge flow mining algorithm that analyzes workers’ information needs from a group perspective and model the referencing behavior of a group as a knowledge graph. In our future work, we will apply the recommendation techniques on the group-based knowledge flow to provide knowledge support for workers in a teamwork environment.

Chapter 6. Conclusions and Future works

6.1 Summary

Knowledge is both abstract and dynamic. A worker’s knowledge flow (KF) comprises a great deal of working knowledge that is difficult to acquire from an organizational knowledge base. In this dissertation, we have considered how to identify the knowledge flow of knowledge workers, and how to provide knowledge support based on KFs effectively. To the best of our knowledge, no existing approach focuses on providing relevant knowledge proactively based on KFs.

We propose KF-based recommendation methods, namely hybrid PCF-KSR, KCF-KSR and ICF-KSR methods, to proactively recommend codified knowledge for knowledge workers and enhance the quality of recommendations. These methods use KF-based sequential rule (KSR) method to recommend topics by considering workers’ knowledge referencing behavior; and then adjust the predicted rating of documents belonging to the recommended topic. Moreover, they consider workers’ preferences for codified knowledge, as well as their knowledge referencing behavior to predict topics of interest and recommend task-related knowledge. The collaborative filtering (CF) method, which is widely used to predict a target worker’s preferences based on the opinions of similar workers, only considers workers’ preferences for codified knowledge, but it neglects workers’ referencing behavior for knowledge.

In the experiments, we evaluate the quality of recommendations derived by the proposed methods under various parameters and compare it with that of the traditional user-based/item-based CF method. The experiment results show that the proposed methods improve the quality of document recommendation and outperform the traditional CF methods.

Additionally, using KF mining and sequential rule mining techniques enhances the performance of recommendation methods and increases the accuracy of recommendations.

The KF-based recommendation methods provide knowledge support adaptively based on the referencing behavior of workers with similar KFs, and also facilitate knowledge sharing among such workers.

Furthermore, we have proposed the group-based KF mining method to identify the KFs of groups of workers. Such groups may be interest groups or communities, where the workers have very similar KFs. A group may comprise many workers with similar KFs, and a worker may join many groups simultaneously according to his/her information needs. Even though workers are in the same group, their KFs will differ in some respects. To discover the KF of a group of workers, we design algorithms that can analyze the workers information needs in their KFs to generate a GKF model. The model is then used to represent the information needs, the direction of knowledge flows, and possible paths for referencing task knowledge for a group of workers. Based on the model, we can identify representative paths as common behavior patterns for the group. Thus, the patterns can be regarded as learning references to help new members of a group. Finally, we implement a prototype system to demonstrate the efficacy of the proposed algorithms. Our system not only derives the KF for a group of workers, but also visualizes the mining results for further analysis.

6.2 Future Works

In our current work, a KF is simply regarded as a set of topics/codified knowledge objects arranged in a time sequence. However, a KF may have a complicated order structure with AND/OR, JOIN and SPLIT operations. In our future work, we will investigate a complex KF mining technique to model workers’ KFs with an order structure that includes such operations. Moreover, the discovered topic is regarded as an abstraction of topic-related documents. Auto-summarization techniques [45, 49] can be applied to extract the theme of a topic by summarizing the documents’ contents. In a future work, we will investigate the use of such techniques to derive knowledge flows based on theme information. In addition, the domain restricted the sample size of the data and the number of participants in the experiments, since it is difficult to obtain a dataset that contains information that can be used for knowledge flow mining. We will evaluate the proposed approach on other application domains involving larger numbers of workers, tasks and documents. Moreover, the method of generating topic subsequences for identifying the target worker’s knowledge referencing behavior is computationally expensive, especially for the large datasets. A more efficient method will be investigated in the future.

Additionally, we will develop a recommendation method based on the GKF, so that

workers can cooperate and share their knowledge with other group members to accomplish a task. Moreover, different working groups in an organization may provide knowledge support for one another. To facilitate knowledge sharing in a group or among groups, we will investigate recommendation methods that provide task knowledge to workers and groups proactively. The effectiveness of a recommendation method depends to a large extent on how much workers trust one another. This factor is important because the level of trust may determine whether or not a worker is willing to share knowledge with others. Through group recommendation methods, task-related knowledge can be shared effectively to enhance the work efficiency of all knowledge workers.

References

[1] A. Abecker, A. Bernardi, K. Hinkelmann, O. Kuhn, and M. Sintek, “Context-Aware, Proactive Delivery of Task-Specific Information: The KnowMore Project,”

Information Systems Frontiers, vol. 2, no. 3, pp. 253-276, 2000.

[2] A. Abecker, A. Bernardi, H. Maus, M. Sintek, and C. Wenzel, “Information Supply for Business Processes: Coupling Workflow with Document Analysis and Information Retrieval,” Knowledge-Based Systems, vol. 13, pp. 271-284, 2000.

[3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules Between Sets of Items in Large Databases,” in Proceedings of the ACM SIGMOD International

Conference on Management of Data, pp. 207-216, 1993.

[4] R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proceedings of the 20th International Conference on Very Large Data

Bases, pp. 487-499, 1994.

[5] R. Agrawal, and R. Srikant, “Mining Sequential Patterns,” Proceedings of the

Eleventh International Conference on Data Engineering, pp. 3-14, 1995.

[6] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications,” in Proceedings

of the International Conference on Management of Data (ACM SIGMOD), pp. 94-105,

1998.

[7] R. Agrawal, D. Gunopulos, F. Leymann, and G. Boblingen, “Mining Process Models from Workflow Logs,” in 6th International Conference on Extending Database

Technology (EDBT'98), Valencia, Spain, pp. 469-483, 1998.

[8] A. Anjewierden, R. de Hoog, R. Brussee, and L. Efimova, “Detecting Knowledge Flows in Weblogs,” in 13th International Conference on Conceptual Structures

(ICCS), pp. 1-12, 2005.

[9] C. Augusto, F. Maria Grazia, and P. Silvano, “Knowledge-based Document Retrieval in Office Environments: the Kabiria System,” ACM Transactions on Information

Systems, vol. 13, no. 3, pp. 237-268, 1995.

[10] R. Baeza-Yates, and B. Ribeiro-Neto, Modern Inofrmation Retrieval, Boston:

Addison-Wesley, 1999.

[11] J. S. Breese, D. Heckerman, and C. Kadie, “Empirical Analysis of Predictive Algorithms for Collaborative Filtering,” in Proceedings of the Fourteenth Annual

Conference on Uncertainty in Artificial Intelligence, pp. 43-52, 1998.

[12] J. S. Brown, and P. Duguid, The Social Life of Information, Boston, MA, USA:

Harvard Business School Press, 2002.

[13] J. Cardoso, and M. Lenic, “Web Process and Workflow Path Mining Using the Multimethod Approach,” International Journal of Business Intelligence and Data

Mining, vol. 1, no. 3, pp. 304-328, 2006.

[14] K. Charter, J. Schaeffer, and D. Szafron, “Sequence Alignment Using FastLSA,” in

Proceedings of the 2000 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS,2000), pp. 239-245, 2000.

[15] Y. B. Cho, Y. H. Cho, and S. H. Kim, “Mining Changes in Customer Buying Behavior for Collaborative Recommendations,” Expert Systems with Applications, vol. 28, no. 2,

在文檔中以知識流探勘與文件推薦提供知識支援 (頁 76-0)