Task-Oriented Information Repository: - 以工作觀為基礎之知識支援模式與系統:工作相關知識遞送與分享

Repository: Managing Codified Knowledge

4.1 Task-oriented information repository

To organize and manage task-relevant information, the repository is constructed with support from domain ontology (i.e., topic taxonomy) to effectively utilize codified knowledge. This session discusses the issue of managing codified knowledge with the support from category scheme.

Categories representing the main subjects of organizations are defined to organize tasks and codified knowledge. Task corpus (feature vector of weighted terms) describing the key subjects of existing task can be constructed by extracting the weighted terms from textual documents. The task categorization database records the relevance degrees between existing tasks and categories based on the result the proposed task categorization model. The task categorization database is used to support the operation of identifying referring tasks based on their similarity to the executing task derived using the relevance degrees of tasks to the categories.

Identifying a small subset of existing tasks as referring tasks can help knowledge workers conduct further task-relevance assessment without reviewing all existing tasks. This chapter illustrates two essential phases in constructing a task-oriented information repository: extracting task corpus from textual data gathered during task execution and deriving the relevance degrees between existing tasks and categories.

4.1.1 Extracting task corpus

The task corpus of a task tr is represented as a feature vector of weighted terms (keywords) derived by analyzing the set of documents generated and accessed by tr. Each document dj is pre-processed and represented as a feature vector

d

G_j

. The centroid approach is employed to derive the feature vector of a task by averaging the feature vectors of documents generated and accessed by the task. Let Dtr denote the set of documents that are generated and accessed by task tr. Furthermore, the task corpus (feature vector) of task tr is defined as the centroid vector _t^JG_r which is the vector obtained by averaging the feature vectors of documents in Dtr. Eq. 4.1 defines the centroid vector _t^JG_r . The weight of a term ki in _t^JG_r is represented as w(ki, tr).

∑

∈

r dj D

j t

d

t

D

1 G

(4.1)

4.1.2 Task categorization model

Existing tasks are categorized based on fuzzy classification, and thus they may belong to more than one category. Fuzzy classification extends the traditional crisp classification notation to associate each object in every category with a membership function so that each object can belong to more than one category (Zadeh, 1965). The

task categorization database records the relationships of existing tasks and

categories, namely, the relevance degrees of each existing task to categories. The relevance degree between a task and a category indicates the strength that the task belongs to the category. The relevance degrees between categories and existing tasks are calculated based on the similarity measures between feature vectors of categories and existing tasks. The feature vector of a category is also expressed as a vector of weighted terms, which represents the main subjects of a category.

The categorization procedure includes the step of deriving the feature vectors of categories and the step of deriving the relevance degrees between existing tasks and categories.

Deriving the feature vector of each category:

Experts predefined a set of categories to represent the main subjects within the organizational domain, such as

“Text Mining”, “Knowledge Management”, etc. The seed-based approach is then applied to generate the feature vectors of categories. Experts select some existing tasks which represent a category. The selected tasks are called the seed tasks of the category. Once the seed tasks have been decided, a centroid vector can be derived from the corpora (feature vectors) of the seed tasks to describe the category. The centroid vector of each category is derived by averaging the feature vectors of corresponding seed tasks.

Let X denote a set of categories, X={c1

, c

,

, c

m}, and let Tcj represent the set of seed tasks of category cj. Let cK^c_j be the centroid vector derived from the task corpora (feature vectors) of seed tasks of cj. The centroid weight of term ki

in

cK^c_j

,

w(k

,

cK_j^c) is derived as Eq. 4.2.

∑

∈

The centroid vectors are used as the initial feature vectors of weighted terms to represent categories. The initial centroid weight of a term represents the degree of importance of the term in a category without considering its importance in other categories, namely its discriminating power to distinguish categories. The weight of a term is further adjusted by considering the discriminating power of the term. For example, a higher weight term denotes that it is a more representative and important concept of the category. However, some terms with a high weight in a category may also have high weights in other categories. Such terms may be common terms, even though they have high weights in categories, which are not discriminating enough to represent each category. To decrease the weight of this kind of common term, we use the probability distribution of terms across categories as a factor to discriminate the categories. Consequently, the weight of a term in a category is adjusted by multiplying the initial centroid weight of the term with the probability distribution of the term appearing in the category.

Let

c

G_j

be the feature vector of category cj which denotes the key concepts of cj, and let w(ki

, c

j) be the weight of term ki in category cj. Then w(ki

, c

j), the importance of term ki in representing category cj

, is proportional to the centroid weight of term k

and the probability distribution of term ki appearing in category cj, which is expressed as Eq. 4.3. Notably, P(ki, cj) is the probability distribution of term ki

appearing in category cj , which is computed according to the distribution of centroid weights of term ki across categories.

where m is the number of categories. Notably, the denominator in the right side of Eq. 4.3 is a normalization factor to normalize the weight of term.

Deriving the relevance degrees of existing task to categories:

Once the feature vector of weighted terms for each category has been extracted, we can derive

the relationship (relevance degree) between categories and existing tasks based on the cosine measure. The membership grade (relevance degree) of task tr to category cj,

µ

cj(tr), can be calculated as the cosine of the angle between two vectors, Gtr

and

G , namely cosine (tG_r

, cG_j

). The relevance degree between a task and a category indicates the strength that the task belongs to the category. The relevance degrees of task tr to the m categories can be modeled as a vector

t

JJK_r^C

characterized by the membership grades of tr to the categories, as expressed in Eq. 4.4.

( ),

( ),.., ( )

m C

r c r c r c r

t JK

µ t µ t µ t >

(4.4)

The task categorization database records the fuzzy classification result. Each task

t

r is associated with its membership grades (relevance degrees) to categories. Notably, the task categorization database is used to support the operation of proposed two-phase task-assessment approach. The details are described in Section 5.1.

4.2 Domain ontology formalization

The domain ontology, a shared conceptualization of a specific domain, is often used to specify the working domain of an organization [57]. Organizing knowledge items into ontological structure based on the domain ontology is promising to support knowledge retrieval in business environments [25]. In this project, we refer the domain ontology to a classification structure of tasks stored in the knowledge repository. Specifically, the domain ontology (DO) is a simple topic taxonomy that is

structured into four levels, including categories, fields, tasks and knowledge items, as

shown in Figure 3.

Categories representing the main subjects of organizations are pre-defined to organize tasks and codified knowledge. Tasks with similar subjects are grouped into fields. This work labeled name of fields according to the schema of ACM Computing Classification Systems (1998). Notably, the relevance degrees to categories represent the subjects of a task, as addressed in Section 4.1. The similarity between tasks can thus be calculated based on their relevance degrees to categories. Based on the fuzzy relationship matrix R, similar tasks are grouped together to form a field, as follows. A threshold value, thres

θ

, is defined to transform the fuzzy relation matrix R into a binary relation matrix B. The threshold value is determined by the max-min operation, as shown in Eq. 4.5.

))

According to Eq. 4.6., the fuzzy relation matrix R is transformed into a binary relation matrix B.

Tasks that have the same relationship with respect to each category in B, are similar tasks to be grouped into a field labeled by a field name. The result generates a

l-by-k field-to-task relation matrix F = [f

j(tr)] such that fj(tr) is one if task tr is grouped into field fj; and is zero otherwise; where l denotes the number of fields.

f1: Information Retrieval

& Organization Impacts

t07:Workflow Modeling Based on XML and Rules Integrating ; t12:Implementation of Task/Role Based Access Control Models;

t17:Data Warehousing and Data Mining for Web Logs Analysis; t18: Designing Composite E-service Platform;

t19: Multi-Criteria Task Assignment in Workflow Management Systems; t22: Discovering Project-based Knowledge Maps;

t31: Collaborative Task-driven Recommendation; t50: Business-to-business Workflow Interoperation t16, t19, t22,

Fig. 3. Example of domain ontology

Chapter 5 Collaborative Task-Relevance

在文檔中以工作觀為基礎之知識支援模式與系統:工作相關知識遞送與分享 (頁 35-40)