Literature review - 透過分析資訊需求變動修正工作特徵檔以提供工作相關資訊

2.1. Task-based knowledge management and retrieval

Managing knowledge within and across organizations is considered as an important tactic for gaining competitive advantages in nowadays business environments. Knowledge Management (KM) activities generally include creation, management and sharing and all the activities make up a cycle or repeated processes [9][24]. Organizations make use of the knowledge assets emerging from organizational operations and management activities to increase their profitability and productivity with the support of Knowledge Management Systems (KMSs) [16][24].

Contemporary KMSs employ Information technology (IT) such as document management and workflow management to facilitate the access, reuse and sharing of knowledge assets within and across organizations [9][17].

Generally, IT mainly focuses on explicit and tacit dimension in knowledge management to support knowledge management activities [17]. In order to manage the explicit knowledge, there are four primary resources being utilized: repositories, refineries, organization roles, information technologies [42]. The repository of structured and explicit knowledge, especially in document form, is a codified approach to manage knowledge [9][15]. Nevertheless, with the growing amount of information in an organization, KMSs face the challenge to fulfill users' information needs.

In general, the operations in an organization are planned around tasks. As knowledge is embedded during the execution of tasks, providing task-relevant knowledge to fulfill the information needs of knowledge workers to complete their tasks is important. Knowledge retrieval is considered as a core component in supporting the workers to perform knowledge-intensive task in a business environment [11]. Recently, Information Retrieval (IR) technique has been greatly exploited in workflow management systems to support the knowledge workers to obtain task relevant knowledge. Furthermore, it is combined with workflow management systems to proactive deliver task-specific knowledge to users [2][11].

For complex and knowledge-intensive tasks, the collaboration among knowledge workers is helpful when there are some knowledge workers interested in similar problems or they share common interests. Sharing knowledge with peer groups is a

superior method in knowledge management [13]. The Computer-Supported Cooperative Work (CSCW) and Recommender Systems also give something additional splendor on collaboration [28][29]. CSCW emphasizes on the power of computer system to help groups of people perform the tasks in a shared environment [29]. Recommender systems employ content-based filtering and collaborative filtering to recommend web pages, movies, books and so on [14][26][28]. However, it is more difficult to provide task-relevant knowledge during the progress of execution of complex and knowledge-intensive tasks because such tasks often consist of several smaller tasks.

2.2. Information retrieval in a vector space model

The key contents of a codified knowledge item (document) can be represented as a term vector (i.e., a feature vector of weighted terms) in n-dimensional space, using a term weighting approach that considers the term frequency, inverse document frequency, and normalization factors [32]. The term transformation steps, including case folding, stemming, and stop word removal, are performed during text pre-processing (Salton et al., 1971; Poter, 1980; Witten et al., 1999). Then, term

weighting is applied to extract the most discriminating terms [3]. Let d be a codified

knowledge item (document), and let dr

= <w(k₁, d), w(k₂, d), …, w(k_n, d)> be the term vector of d, where w(ki, d) is the weight of a term ki that occurs in d. Note that the weight of a term represents its degree of importance in representing the document (codified knowledge). The well-known tf-idf approach, which is often used for term

(keyword) weighting (Poter, 1980), assumes that terms with higher frequency in a

document and lower frequency in other documents are better discriminators for representing the document. Let the term frequency

tf

( d

k

_i, ) be the occurrence frequency of term ki in d, and let the document frequency

df

(

k

_i)represent the number of documents that contain k_i. The importance of k_i is proportional to the term frequency and inversely proportional to the document frequency, which is expressed as Eq. 2.1:

(

( , ) log( ( ) 1)

)

⁽ ^, ⁾ ^(log ⁽ ⁾ ¹⁾

) 1 ,

( 2 × +

= ×

∑ ⁱ ⁱ

i i

i df k

d N k tf k

df N d

k tf d

w (2.1)

where N is the total the number of documents. Note that the denominator on the right-hand side of the equation is a normalization factor that normalizes the weight of

a term.

Similarity measure: The cosine formula is widely used to measure the degree of similarity between two items, x and y, by computing the cosine of the angle between their corresponding term vectors xr and yr, which is given by Eq. 2.2. The degree of similarity is higher if the cosine similarity is close to 1.

y x

y y x

x y

sim r r

r r r

r = •

=cosine( , ) )

( _(2.2)

2.3. User modeling by information filtering technique

Information retrieval (IR) and information filtering (IF) technologies applied to document management is generally the first step of knowledge management activities, since textual data such as articles, reports, manuals, know-how documents and so on are treated as the valuable and explicit knowledge within organizations [24]. In addition, IR and IF are considered as the core technologies to help organizations collect and process documents, to reduce the problem of information overload, and to provide relevant and needed information for knowledge workers to accomplish their tasks [6][19][34].

IF systems are similar to conventional IR ones, but rather than focusing on facilitating users' short-term information needs, IF systems lay emphasis on personalization to support long-term information needs of users [3][5][22][23][39].

Accordingly, maintaining and learning users' profiles is an important issue in order to support long-term information services.

Various approaches for learning users' interests or preferences from textual documents or web pages have been proposed [4][21][22][23][27]. The well-known approaches in Information Retrieval or Information Theory are modified and then employed to model or capture user's dynamically changed interests, for example, Rocchio algorithm, information gain theory, Bayesian classifier. Sieg et al. (2004) integrate user profiles and concept hierarchies to infer users' information context to enhance original queries. Widyantoro et al. (2001) use a three-descriptor model to learn user's multiple interest dynamics, which maintains a long-term descriptor to capture the user's general interests and a short-term descriptor to keep track of the user's more recent, faster-changing interests. Moreover, an auto weight-adjusted mechanism is employed to adjust the weight of positive and negative descriptors to make the short-term descriptor react to a drastic change in interest faster. Notably, all

these approaches require users' relevance feedback, including explicit one (users' linguistic rating) and implicit one (users' access behavior) to reach this goal.

Relevance feedback effectively improves search effectiveness through query reformulation. Various studies have demonstrated that relevance feedback applied in the vector model is an effective technique for information retrieval [30][33].

Consequently, the IF systems learn users' current information needs from the relevance feedback and update the model for information filtering in the future. Such kind of learning approaches can maintain the users' profiles adequately once the systems receive the feedback, hence the learning approaches are regarded as the incremental learning technique.

In addition to relevance feedback, the characteristic of knowledge retrieval activities in the working environment is also needed to be taken into consideration to support the workers more precisely. The characteristic of knowledge retrieval activity is that the worker's information needs is always associated with the executing task at hand. Generally, a worker uses documents to understand a task, solve the encountered problem, or result in another search behavior to find a solution. Accordingly, several empirical studies focus on how documents are selected and used by workers during executing task [36][37]. Furthermore, K-Support System takes the characteristics of task stage into account and employs domain ontology to provide dynamic knowledge support [19].

Though the success of the IF techniques, few consider that the effect of variation processes of users' information needs and collaboration between the workers who have similar variation processes. So an enhanced profile adaptation approach taking the variations of topic needs over time of workers into account in advance and keeping them rather than just reflecting the information needs accumulated is required.

Such can help the workers take advantage of the past experience of the other workers and enable them to have more chances to retrieve task-relevant knowledge.

2.4. Relevance feedback techniques

Relevance feedback (RF) improves the search effectiveness through query reformulation [33]. The RF technique reformulates or expands the original query based on partial relevance judgments, i.e., feedback on part of the evaluation set.

Relevant documents with positive feedback have a positive influence on the weight of terms, while irrelevant documents with negative feedback have a negative influence

on the weight of terms. A refined query vector can be generated by adding the term weights of relevant documents and subtracting the term weights of irrelevant documents. Eq. 2.3 illustrates the Standard_Rocchio method designed by Rocchio. A modified query vector qr_m is derived using the relevance of documents (as feedback) to adjust the query vector qr.

1 1

Standard_Rocchio:

j r j n

j j

d D d D

r n

q q d d

D D

α β γ

∀ ∈ ∀ ∈

= +

∑

−

∑

r r r r

(2.3)

where Dr denotes the set of relevant documents and Dn represents the set of irrelevant documents according to user assessments. D_r and D_n represent the number of documents in the sets Dr and Dn respectively; and α β γ, , are tuning constants.

在文檔中透過分析資訊需求變動修正工作特徵檔以提供工作相關資訊 (頁 12-17)