Chapter 5. Group-based Knowledge Flow Mining Methods
5.7 The Prototype System for Mining Group-based Knowledge Flows
5.7.2 System Implementation
To implement our prototype system for group-based KF mining, we use Microsoft Visual Studio 2005 (with C#) to develop the system and Microsoft SQL Server 2005 as the database system to storing the dataset. Because the dataset contains workers’ logs, it should be preprocessed to generate each worker’s codified-level KF and topic-level KF. To obtain the KF, documents in the dataset are grouped into eight clusters by using a single-link clustering method. Based on the clustering results, a topic-level KF is generated by mapping the codified knowledge into its corresponding clusters for each knowledge worker. Then, the two types of KF, the topic-level KF and the codified-level KF, are derived to describe the information needs of a worker. We use such KFs to build a prototype system to demonstrate the method for mining the knowledge flows of a group of workers.
Our system has two major functions: worker clustering and group-based knowledge flow mining. The former identifies a group’s knowledge flow, and the latter uses a directed acyclic graph to present the mining results. An interface that can visualize the KF is necessary. Note that our system can be applied in any knowledge intensive organization to help workers obtain and learn knowledge. Next, we describe the system in detail.
Fig. 28: The main frame of the KF mining system
The knowledge flow mining system is comprised of three modules: the main module, the CLIQUE clustering module and the GKF model. Each module has functions to help the user (a manager/worker) build a knowledge flow easily. Fig. 28 shows the main frame of the system, which provides essential functions for building the GKF model, e.g., the system settings, the KF alignment similarity and clustering functions. The system setting is used to initialize the system environment, e.g., database selection. The KF similarity function calculates the similarity between two workers’ knowledge preferences based on their knowledge flows and creates a similarity matrix of the workers. The parameter alpha adjusts the relative importance of the KF alignment similarity and the aggregated profile similarity on a scale of 0 to 1, as shown in Eq. (8). The user can specify the value of alpha and use the KF similarity function to create a KF similarity matrix based on the specified value. Then, the CLIQUE clustering method uses the similarity matrix to cluster workers who have similar KFs. The system also provides an interface to show the topic-level KFs of all workers and the results of worker clustering. To simplify the presentation of the KFs, we use a number to represent a topic domain that consists of topic-related terms.
71
Fig. 29: The CLIQUE clustering module
Fig. 29 shows the CLIQUE clustering module. Before using the module, we have to set two parameters: the number of rows in the KF similarity matrix and the clustering threshold.
The number of rows is used to determine the number of times clustering is performed using the CLIQUE clustering method, while the threshold is used to cluster workers whose similarity scores are higher than a certain value. Then, the clustering result is displayed on the system interface. For example, to perform clustering, the value of alpha is set at 0.3, the number of rows of the KF similarity matrix is 14 and the similarity threshold is set at 0.4.
Each group is comprised of several workers, and each worker belongs to several task-based groups based on the KF similarities. After clustering similar workers, the system stores the clustering results in the database for further utilization and analysis.
Next, using the proposed algorithm, the system builds a group-based knowledge flow (GKF) for a group of workers, as shown in Fig. 30. All the workers in a cluster have similar KFs, which are used to generate a GKF graph to characterize the referencing behavior of the group. In the graph, each circle is a topic domain represented by a number, while each directed edge indicates the flow of knowledge between two topics. The topic domain contains a topic profile, which consists of several representative terms and their term weights. Fig. 30 shows the profile of topic domain 53 in a small window. The listed terms represent the knowledge of the topic.
Fig. 30: The GKF graph and knowledge referencing paths for a specific group
In addition, the number on an arrow indicates the importance of a flow relation in this group’s topics. From the GKF graph, we observe that 6 topics, i.e., 4, 17, 19, 21, 27, and 29, can be referenced in parallel. That is, there is no specific order among the topics accessed by this group of workers. Moreover, the task-related knowledge may flow through 2 paths from the start vertex to the end vertex. In Fig. 30, the listed paths, which consist of several relevant topics and directed edges, are the knowledge referencing paths of this group. The paths with scores larger than a user-specified threshold are frequent referencing behavior patterns. The paths can be regarded as knowledge references for workers to share needed task knowledge.
5.7.3 Discussion
GKF mining by task-based groups has several advantages in a knowledge intensive organization. A GKF represents the flow and delivery of knowledge when workers in the same group perform a task. It can be used to identify topics of interest, major referencing behavior patterns, and the long-term evolution of the group’s information needs; and it allows task knowledge to be circulated and delivered efficiently among workers. If a novice joins the group, the GKF can provide a reference for learning group-based knowledge. The frequent knowledge paths in a GKF help a worker learn task-related knowledge, overcome obstacles encountered in a new domain, and enhance his/her learning efficiency. Moreover, based on the GKF, a manager can determine who has task-related knowledge and who satisfies a task’s
73
requirements, and then assign appropriate workers accordingly. In addition, through the GKF, an organization can realize the frequent referencing behavior and the information needs of a group of workers, and actively provide knowledge support for them. The GKF can also enhance organizational learning, as well as facilitate knowledge sharing and reuse in the context of collaboration and teamwork.
In this work, we propose a recommendation framework based on the discovered knowledge flow for each knowledge worker, as described in Chapter 4. Such method analyzes workers’ referencing behavior and provides task-related documents to fulfill workers’ tasks.
Because teamwork in an organization is common, we also develop a group-based knowledge flow mining algorithm that analyzes workers’ information needs from a group perspective and model the referencing behavior of a group as a knowledge graph. In our future work, we will apply the recommendation techniques on the group-based knowledge flow to provide knowledge support for workers in a teamwork environment.