Learning Dynamic Information Needs: A Collaborative Topic Variation Inspection Approach

(1)

Learning Dynamic Information Needs: A Collaborative

Topic Variation Inspection Approach

I-Chin Wu

Department of Information Management, Fu Jen Catholic University, Taipei, 242 Taiwan. E-mail: [email protected]

Duen-Ren Liu and Pei-Cheng Chang

Institute of Information Management, National Chiao Tung University, Hsinchu, 300 Taiwan. E-mail: [email protected]; [email protected]

For projects in knowledge-intensive domains, it is cru-cially important that knowledge management systems are able to track and infer workers’ up-to-date infor-mation needs so that task-relevant inforinfor-mation can be delivered in a timely manner. To put a worker’s dynamic information needs into perspective, we propose a topic variation inspection model to facilitate the application of an implicit relevance feedback (IRF) algorithm and collaborative filtering in user modeling. The model ana-lyzes variations in a worker’s task-needs for a topic (i.e., personal topic needs) over time, monitors changes in the topics of collaborative actors, and then adjusts the worker’s profile accordingly. We conducted a number of experiments to evaluate the efficacy of the model in terms of precision, recall, and F-measure. The results suggest that the proposed collaborative topic variation inspec-tion approach can substantially improve the performance of a basic profiling method adapted from the classical RF algorithm. It can also improve the accuracy of other methods when a worker’s information needs are vague or evolving, i.e., when there is a high degree of variation in the worker’s topic-needs. Our findings have implications for the design of an effective collaborative information filtering and retrieval model, which is crucial for reusing an organization’s knowledge assets effectively.

Introduction

Information seeking or searching is regarded as the pri-mary activity of knowledge workers when they execute tasks. The 2004 International Data Corporation (IDC) Report (Feldman, 2004) estimated that 90% of a company’s acces-sible information is only used once. If knowledge cannot be accessed easily and reused effectively, the accumulated

Received February 13, 2008; revised December 8, 2008; accepted July 21, 2009

information is essentially useless and the company’s produc-tion costs will increase because similar knowledge must be recreated. Thus, successful knowledge management (KM) practices require an understanding of workers’ information needs to ensure effective information-seeking activities when they perform long-term tasks.

Although some KMSs incorporate information retrieval (IR) functions, workers find it difficult to express their infor-mation needs by using short query terms (LaBrie & St. Louis, 2003; Pons-Porrata, Berlanga-Llavori, & Ruiz-Shulcloper, 2007; Ruthven, 2001). In many cases the worker may only have a general idea about a topic and may be uncertain about what information is required to execute the task at hand (Belkin, Oddy, & Brooks, 1982; Jansen, 2005; White, Jose, & Ruthven, 2003, White & Kelly, 2006). The anomalous state of knowledge (ASK) hypothesis, posits that a searcher’s infor-mation needs arise from an anomaly in the state of knowledge; thus, there is a gap between their knowledge about a task and the perceived requirements of the task. The gap is called the information need and results in information-seeking activi-ties to solve the problem, i.e., satisfy the searcher’s informa-tion needs (Belkin et al., 1982; Byström & Järvelin, 1995; Mackay, 1960; Taylor, 1968; White et al., 2004). To address this problem, we propose an effective information-learning method based on a topic variation inspection process. The method considers an individual’s search behavior pattern and the interests of workers’ with similar information needs to learn the individual’s dynamic information needs precisely in a timely manner. More specifically, we integrate the tradi-tional implicit relevance feedback (IRF) algorithm (Kelly, 2004, Ruthven, Lalmas, & van Rijsbergen, 2003; White, 2004; White et al., 2004; Widyantoro, Loerger, & Yen, 2001) with a user information needs profiling process to improve the knowledge retrieval functions in KMSs. Conventional user profiling approaches only reflect the user’s previous infor-mation needs based on their personal search behavior for

(2)

relevant documents. They do not consider possible changes in the topics searched by users with similar information needs. In this work we adopt the concept of collaborative ﬁltering used in recommendation systems to show how other work-ers’experiences can enhance the target worker’s collaborative search behavior patterns for the task at hand (Balabanovic & Shoham, 1997; Konstan et al., 1997).

Contemporary KMSs employ information technologies (IT), such as cooperative document management portals, groupware, and workflow management systems to facilitate access to knowledge assets, as well as the reuse and sharing of knowledge assets within and across organizations (Davenport & Prusak, 1998; Kankanhalli, Tanudidjaja, Sutanto, & Tan, 2003). A repository of structured explicit knowledge, espe-cially in document form, is a codified strategy for managing knowledge (Davenport & Prusak, 1998; Gray, 2001). Accord-ing to Gray (2001), codified knowledge helps knowledge workers exploit their organization’s resources fully. Kankan-halli et al. (2003) observed that product-based firms, such as Xerox, Microsoft, and Hewlett-Packard, rely on both codifi-cation and sharing approaches to keep pace with dynamic and rapid changes in the business environment, i.e., the companies operate in a complex and high-volatility context. However, with the growing amount of information in organi-zational databases, KMSs face the increasingly difficult chal-lenge of helping users find pertinent information efficiently. Thus, knowledge retrieval is considered a core component of systems that support workers engaged in knowledge-intensive tasks in a business environment (Abecker, Bernardi, Maus, Sintek, & Wenzel, 2000). To resolve the problem of retrieving needed information from a vast amount of codified knowledge, (IR) techniques coupled with workflow manage-ment systems (WfMS) are employed to support proactive delivery of task-specific information based on the context of the tasks within the overall process (Abecker et al., 2000; Fenstermacher, 2002). For example, the KnowMore system maintains task specifications (profiles) that detail the process-context of tasks and associated information items (Abecker et al., 2000). Context-aware, task-specific knowledge can thus be provided based on the task’s specifications and the current execution context of the process. In another approach, a process meta-model that specifies the context of the objects is integrated with workflow systems to capture and retrieve information or codified knowledge within a process con-text (Kwan & Balasubramanian, 2003). The weakness of the above methods is that creating a task-based profile or specification requires human effort. Moreover, they employ push-based strategies that provide task-relevant information without considering the user’s active search behavior. In other words, they cannot identify and track a worker’s dynamic information needs (task needs) over time precisely. This is a critical issue because a worker’s task-needs can emerge and change in different time frames during a task’s execution.

It is widely agreed that information seeking is a difﬁ-cult and complex process for workers during the execution of long-term projects/tasks (Kuhlthau, 1993; Spink, Wilson, Ellis, & Ford, 1998; Vakkari, Pennanen, & Serola, 2003).

A number of studies on information search processes observe that users’ information needs and search behavior patterns vary according to the problem stage they are in (Campbell & Van Rijsbergen, 1996; Ingwersen & Järvelin, 2005; Kuhlthau, 1993; Vakkari et al., 2003; Tang & Solomon, 1998). Kuhlthau’s search process model differentiates a task into six stages with their associated characteristics. Speciﬁcally, it divides the information search process from the user’s perspective into the following stages: task initiation, topic selection, prefocus exploration, focus formulation, informa-tion collecinforma-tion, and search closure. The objective is to observe how users locate and interpret information to form a per-spective on a topic in different problem stages. During the search process, thoughts evolve from unclear and vague to clear, more focused understanding. The user’s search behav-ior also changes with the formulation of a focus. Existing case studies showed that, in the early stages of a task’s exe-cution, students seek relevant information related to a general topic.(Kuhlthau, 1993; Vakkari et al., 2003). The user’s search behavior also changes with the formulation of a focus. There-fore, it should be much easier to analyze users’ dynamic information needs in terms of topic changes, rather than by analyzing changes in the keywords input to the system.

To put a worker’s dynamic information needs into per-spective, we propose a topic variation inspection model to facilitate the application of an IRF algorithm and collabo-rative filtering in user modeling. Relevance feedback (RF) improves the effectiveness of searches by reformulating or expanding the original query based on partial relevance judg-ments, i.e., feedback on part of the evaluation set (Rocchio, 1971; Salton & Buckley, 1990). By employing the implicit RF algorithm, the system can monitor users’ access behavior unobtrusively to learn their information needs and mod-ify their original queries. The method identifies changes in a worker’s information needs by inferring those needs from documents the worker has browsed, read, or down-loaded. Our approach bears some similarity to Campbell and Van Rijsbergen’s (1996) Ostensive Model, which describes how users’ information needs correspond to their knowledge states. The degree of uncertainty influences a user’s percep-tion of the “relevance” of informapercep-tion and results in different information-seeking activities. In addition, following previ-ous studies (Campbell & Van Rijsbergen, 1996; Kuhlthau, 1993; Ruthven et al., 2003; Vakkari, 2000), we assume that a knowledge worker’s uncertainty will decrease as the task pro-gresses. This contrasts with the traditional relevance feedback model, which assumes that all information (i.e., information items that users regard as relevant) is generated by the same knowledge state. We consider that recently accessed docu-ments reflect a worker’s current task needs more accurately than those documents accessed earlier. Thus, a time factor is incorporated into the adapted IRF algorithm to reflect the rel-evance of the current information. In addition, we try to ana-lyze changes in the worker’s topic-needs based on the task’s performance over time. For example, if a researcher is seek-ing relevant knowledge documents for a project, the research topics may vary as follows: “Event detection”=> “Mining

(3)

event change”=> “Mining Patent change” => “Patent Min-ing” where the symbol “=>” indicates that the event on the left-hand side occurs before the event on the right-hand side. Therefore, we propose a learning model of information needs that can track a worker’s topic-needs in different time frames during a task’s performance. Because each topic in the topic taxonomy is associated with a corpus, we can compile and rank a set of task-relevant topics by calculating the similarity between the corpus of each topic and the worker’s current task profile. The analysis results can be incorporated into the pro-file adaptation process for query expansion based on the RF algorithm (Kelly, 2004; Rocchio, 1966, 1971; Salton & Buckley, 1990) by means of the domain corpus. Moreover, to determine a worker’s task needs, we try to predict changes in the worker’s information needs by identifying other workers with similar information needs. That is, we consider possible changes in information needs in terms of how other workers’ experiences could enhance the target worker’s search results and satisfy their information needs for the task at hand. In a recent study Zhou, Ji, Zha, and Giles (2006) suggested that social actors can influence the evolutionary trends of topics, i.e., the transition from one topic to another. Accordingly, we propose an information needs learning method based on a collaborative topic variation inspection process. The method adjusts a worker’s task profile according to their knowledge-seeking activities (e.g., implicit document access behavior) so that other workers with similar variations in topic needs over time can be identified. The proposed model not only identifies variations in a worker’s task-needs for topics (i.e., self-topic needs) over time, but also analyzes the collective (i.e., collaborative actors’) topic variation patterns and adjusts the worker’s profile accordingly. Then, to satisfy the worker’s dynamic task needs, codified knowledge relevant to the current task can be retrieved based on the adjusted task profile. The remainder of this paper is organized as follows. Litera-ture Review provides a review of related works. In Overview of the Dynamic Information Needs Learning Approach we formulate the problem and discuss our methodology. Mea-suring Variations in Topic-Needs over Time describes how we use the proposed model to measure variations in topic needs over time. The Topic Variation Inspection Process sec-tion explains how we use a personal topic variasec-tion inspecsec-tion process and a collaborative topic inspection process to pre-dict a worker’s task-needs. We also present an algorithm for identifying the task needs of similar social actors via a topic variation matrix that is also presented. The experiment design and experiment results are presented in the next two sections, followed by Discussion and Implications, then Conclusion and Future Work.

Literature Review

Relevance Feedback in a Vector Space Model

Relevance feedback improves the search effectiveness of the automatic query reformulation process (Rocchio, 1966, 1971). The literature on information retrieval shows that

relevance feedback applied in a vector model is an effec-tive technique in a retrieval environment (Rocchio, 1966; Salton & Buckley, 1990). In a vector space model the key contents of a codified knowledge item (document) can be rep-resented as a term vector (i.e., a feature vector of weighted terms) in an n-dimensional space, using a term-weighting approach that considers the term frequency, inverse document frequency, and normalization factors (Salton & Buckley, 1988). The term transformation steps, namely, case fold-ing, stemmfold-ing, and stop word removal, are performed during text preprocessing (Porter, 1980; Salton & Buckley, 1988; Witten, Moffat, & Bell, 1999). Then, term weighting is applied to extract the most discriminating terms (Baeza-Yates & Ribeiro-Neto, 1999). Let d be a codified knowl-edge item (document), and let−→d = <w(k1, d), w(k2, d), . . . , w(kn, d)> be the term vector of d, where w(ki, d) is the weight of a term ki that occurs in d. Note that the weight of a term represents its degree of importance in representing the document (codified knowledge). The well-known tf-idf approach, which is often used for term (keyword) weighting, assumes that terms with a higher frequency in one docu-ment and a lower frequency in other docudocu-ments are better discriminators for representing the first document. Let the term frequency tf(k_i,d) be the occurrence frequency of term k_iin d, and let the document frequency df(k_i) represent the number of documents that contain k_i. The importance of k_iis proportional to the term frequency and inversely proportional to the document frequency, which is expressed as follows:

w(ki, d) = 1 i tf(ki, d) × log(N/df(ki) + 1) 2tf(ki, d) × log N df(ki)+ 1 , (2.1)

where N is the total number of documents. Note that the denominator on the right-hand side of the equation is a normalization factor that normalizes the weight of a term. The classic relevance feedback method proposed by Rocchio (1971) and the Ide_Dec_Hi (1971) method, which use a vector space model to derive the modiﬁed query −→qm, are formulated in Equations 2.2 and 2.3, respectively (Baeza-Yates & Ribeiro-Neto, 1999):

Standard_Rocchio: −→q_m= α−→q + β 1 |Dr| ∀−→dj∈Dr − → d_j − γ 1 |Dn| ∀−→d_j_∈D_n − → dj (2.2) Ide_Dec_Hi: −→q_m= α−→q + β ∀−→dj∈Dr − → d_j − γ maxnon-relevant(−→dj) (2.3) where D_ris the set of relevant documents and D_nis the set of non relevant documents, both of which are determined by the

(4)

user; |D_r| and |D_n| denote the number of documents in the sets D_r and D, respectively; andα, β, γ are tuning constants. In Equation 2.3, maxnon-relevant(−→dj) denotes the highest non-relevant document. The two methods yield similar results (Baeza-Yates & Ribeiro-Neto, 1999). The Rocchio method sets α = 1, Ide sets α = β = γ = 1, and Harman (1992) sets β = 0.75, γ = 0.25. Salton and Buckley (1990) showed that the Ide_Dec_Hi algorithm performs slightly better than the Rocchio algorithm; hence, we adopted Ide_Dec_Hi as our baseline method.

Techniques for Modeling Users’ Information Needs

Textual data, such as articles, reports, manuals, and know-how documents, are treated as valuable and explicit knowl-edge in organizations (Nonaka, 1994). Therefore, the ﬁrst step of knowledge management usually involves the application of information retrieval (IR) and information ﬁltering (IF) tech-niques to solve document management problems. In fact, IR and IF are considered core technologies that help organiza-tions collect and process documents, mitigate the problem of information overload, and provide relevant information for knowledge workers to accomplish their tasks

IF systems are similar to conventional IR systems; how-ever, rather than focus on supporting users’ short-term information needs, IF systems emphasize the concept of per-sonalization to support users’ long-term information needs (Baeza-Yates & Ribeiro-Neto, 1999; Belkin & Croft, 1992; Elovici, Shapira, & Kantor, 2006; Shapira, Shoval, & Hanani, 1999). Basically, IF systems maintain user profiles and rele-vant information is delivered to users based on their profile. Thus, learning and maintaining users’ profiles are important aspects of supporting long-term information services. Various approaches for learning users’ interests or preferences from textual documents or Webpages have been proposed (Bal-abanovic & Shoham, 1997; Pazzani & Billsus, 1997; Mid-dleton, Shadbolt, & Roure, 2004; Mostafa, Mukhopadhyay, Lam, & Palakal, 1997). Well-known approaches in infor-mation retrieval and inforinfor-mation theory, such as Rocchio’s algorithm, information gain theory, and the Bayesian clas-sifier, have been modified and used to model or capture a user’s dynamically changing interests. Most approaches require users’ relevance feedback (RF) information, as well as explicit feedback (users’ ratings on information items) or implicit feedback (users’ access behavior), to achieve this goal. RF improves search effectiveness through query reformulation. Kelly and Fu’s (2007) study of RF shows that determining a user’s information needs based on their feedback can improve retrieval performance significantly. Moreover, various studies have demonstrated that applying RF in a vector model enhances IF (Middleton et al., 2004; Salton & Buckley, 1990; Widyantoro &Yen, 2005;Yang,Yoo, Zhang, & Kisiel, 2005).

Widyantoro et al. (2001) use a three-descriptor model to learn a user’s multiple interests. The approach maintains a long-term descriptor to capture the user’s general inter-ests and a short-term descriptor to keep track of their more

recent interests, which change more rapidly. An automatic weight-adjustment mechanism adjusts the weight of posi-tive and negaposi-tive descriptors to ensure that the short-term descriptor records major changes in the user’s interests imme-diately. In addition, a profiling approach has been adopted in the workplace proposed to enhance knowledge retrieval and promote knowledge sharing among project-based or interest-based groups (Abecker et al., 2000; Agostini, Albolino, De Michelis, De Paoli, & Dondi, 2003; Davies, Duke, & Stonkus, 2003); and a cooperative agent architecture has also been developed to facilitate task-based information filtering within a work process (De Bra, Houben, & Dignum, 1997). In this the latter type of framework, information filtering tech-niques combined with an intelligent agent-based architecture are commonly adopted to streamline the delivery of knowl-edge from internal or external knowlknowl-edge repositories (Spies, Clayton, & Noormohammadian, 2005; Ye & Fischer, 2002). In recent years, several studies have stressed the impor-tance of modeling users’ interests or information needs for a specific work task incrementally in terms of topics, instead of as a set of weighted keywords or meta-data. Sieg, Mobasher, and Burke (2004) integrate user profiles and concept hierar-chies to infer users’ information contexts in order to enhance the original queries. In this way, IF systems learn users’ cur-rent information needs from the RF and update the model for subsequent information filtering. Godoy and Amandi (2006) proposed an incremental concept clustering algorithm called WebDCC, which uses intelligent agents to build a profile of the user’s search behavior for Web documents, i.e., it mod-els a user’s preferences and interests based on observations of their behavior. Learning approaches can maintain users’ profiles adequately once the system receives feedback or observes changes in search behavior patterns; hence, such approaches are regarded as incremental learning techniques. In addition to modeling user’s interests in terms of topics, incorporating the domain ontology (i.e., a hierarchical struc-ture of domain topics/categories) into the profiling process is an effective way of modeling users’ information needs for tasks (Godoy, Schiaffino, & Amandi, 2004; Middleton et al., 2004; O’Leary, 1988; Pons-Porrata et al., 2007).

Overview of the Dynamic Information Needs Learning Approach

Motivation

As mentioned above, we propose an information needs learning model based on a collaborative topic variation inspection process to track and analyze a worker’s informa-tion needs for a task, i.e., task-needs. In this subsecinforma-tion we formulate the problem and basic concepts as follows. • First, we focus on a worker’s search behavior during the

exe-cution of knowledge-intensive tasks (tasks for short), such as research projects in academic institutions and product devel-opment tasks in R&D departments. The 2004 International Data Corporation (IDC) Report (Feldman, 2004) estimated that knowledge workers spend 15%–35% of their time just

(5)

searching for information; however, on average, they succeed in ﬁnding the desired information less than 50% of the time. Nevertheless, information seeking or searching is regarded as a key activity of knowledge workers during the execution of tasks. Hence, knowledge (information) retrieval is a core component of KMSs.

• Generally, a worker’s search behavior results from the fact that there is a gap between their knowledge about the task at hand and the perceived requirements of the task. Taylor (1968) described the continuous mental development of a user’s information need as evolving from an “unconscious need” over a “conscious need” to a “compromised need.” Subsequently, Belkin et al. (1982) extended the theory and put forward the hypothesis of the anomalous state of knowledge (ASK). The ASK hypothesis posits that a searcher’s informa-tion need arises from an anomaly in the state of knowledge, such that there is a gap between their knowledge about a task and the perceived requirements of the task. The gap is called the information need and results in information-seeking activ-ities to solve the problem (Belkin et al., 1982; Mackay, 1960; Taylor, 1968; White et al., 2004). Recall that, during the search process, the worker’s thoughts evolve from unclear and vague to a clear, more focused understanding (Ingwersen & Järvelin, 2005; Kuhlthau, 1993; Vakkari et al., 2003). Therefore, it should be much easier to analyze users’ dynamic informa-tion needs in terms of topic changes, rather than by analyzing changes in the keywords input to the system.

• Third, following Campbell and Van Rijsbergen (1996), who proposed the Ostensive Model to describe how users’informa-tion needs correspond to their knowledge states, we assume that the worker’s uncertainty will decrease as the performance of the task progresses. We consider that recently accessed documents reﬂect a worker’s current task needs more accu-rately than documents accessed earlier. Thus, a time factor, i.e., a decay function, is incorporated into the topic variation inspection process to identify the user’s emerging or declining interest in work-task topics at different times.

• Finally, to support the execution of the current task, a worker usually needs to reference previously executed topics (the executed-task set) in the task-based domain ontology (DO). In this context, if the task profiles of two workers’ have some degree of similarity and they exhibit similar rates of topic change in the constructed task-based DO, it is reason-able to assume that they will have similar information needs for tasks in the near future. Thus, we propose a collabora-tive topic variation inspection approach to predict a worker’s information needs based on variations in the topic needs of similar workers over time. The idea is similar to that of col-laborative filtering techniques used in recommender systems (Balabanovic & Shoham, 1997). Basically, such techniques identify users whose profiles are similar to the profile of the target worker. Then they provide recommendations or predic-tions based on the other workers’ experiences to improve the incomplete content-based approach (Balabanovic & Shoham, 1997; Konstan et al., 1997). Therefore, we propose a topic variation inspection model to facilitate the application of an implicit relevance feedback (IRF) algorithm and collaborative filtering in user modeling.

Note that, when modeling users’ task needs, we use a task-based topic taxonomy to conceptualize domain infor-mation about organizational activities. The topics and their

corresponding topic profiles are used as references to adjust the task profiles based on their relevance (similarity) to the documents accessed by workers. Modeling worker’s dynamic task-needs is performed in two phases: a personal profile adaptation phase and collaborative adaptation phase. Tech-nically, the adaptation process takes the time factor into consideration. The rationale for our approach is that the more recently a document has been accessed, the more impor-tant it should be in reflecting a worker’s current task needs. Thus, the contribution of the user’s previous task needs (pro-file) to adjusting the task profile should be reduced according to the amount of time that has passed. The personal task pro-file only reflects previous information needs; that is, it does not consider possible changes in the topic-needs (i.e., infor-mation needs for tasks). Thus, we try to measure changes in a worker’s information needs by identifying other work-ers with similar information needs, i.e., profile adaptation via collaborative filtering techniques. The rationale behind collaborative profile adaptation is that workers with simi-lar changes in previous information needs are likely to have similar changes in future information needs; thus, possible changes in a worker’s information needs can be inferred from changes in the information needs of similar workers. The Proposed Approach

The personal topic variation inspection phase adjusts task profiles incrementally based on the documents accessed by the worker over time. That is, the documents accessed most recently are deemed more important than those accessed in the early stages of the task’s execution. The collaborative topic variation inspection phase uses variations in the topic patterns of similar workers to predict a target worker’s future task-needs and adjust their task profile accordingly. We also use an event-based technique to trigger the profile adaptation step based on the results of the topic variation process. An event occurs when a worker accesses a document at a spe-cific time, and each event-based topic need is modeled as a weighted topic. The weight of a topic is derived by consider-ing the similarity between the topic profile and the profile of the document accessed by the worker at that time. Variations in a worker’s topic needs can be measured by the difference the topic’s weights at the two timepoints. Figure 1 provides an overview of the proposed methodology.

Phase 1: Personal topic variation inspection process. When a worker accesses a document, the system captures infor-mation about their search behavior. The proﬁle adaptation phase considers the effect of the time factor and the worker’s behavior (the document accessed) in order to adjust the cor-responding task proﬁle with the help of a task-based topic taxonomy. Note that other workers’ feedback is not consid-ered at this point. Since a current task is often related to some previously executed tasks in the organization, task-based topics play an important role as references that can provide workers with task-relevant documents (Middleton et al., 2004; O’Leary, 1988). In a previous work (Liu & Wu, 2008), we showed that relevant topics are points of reference

(6)

Capturing Users' Access Behavior

Annotating Events

Phase one: Self- Profile Adaptation Measuring variations of topic-needs Root F2 T7 F1 T2 T6 W(T6) W(T2) W(T7) kw1, , kwn Constructing Topic- needs Matrix Identifying Similar Workers Self-adapted task profile Topic-needs variation matrices

Phase two: Collaborative Profile Adaptation

Task-needs Prediction via Collaborative Profiling Technique

FIG. 1. Overview of the approach.

that are very helpful for generating task profiles, especially when only a few documents are accessed in the early phase of a task’s execution. The topics and their corresponding topic profiles are used as references to adjust task profiles for generating self-adapted profiles according to their relevance (similarity) to the documents accessed by workers. Details of the proposed personal topic variation inspection process are given in the first subsection of The Topic Variation Inspection Process (below).

Phase 2: Collaborative actors’ topic variation inspection process. Instead of specifying changes in workers’ infor-mation needs explicitly, we try to capture variations in their needs through the documents they access. To this end, we use variations in workers’ topic needs over time to model and predict the target worker’s possible task-needs. Varia-tions in each worker’s topic-needs over time are expressed as a needs variation matrix, i.e., a time-period by topic-needs matrix. A similar approach is used to find workers with similar variations in topic needs based on the derived topic-variation matrix and personal profiles in a time window. After workers with similar needs have been identified, their task needs (at time T+ 1) are used to predict the worker’s task needs at time T+ 1, which are modeled as a collaborative profile. The derived collaborative profile is then combined with the self-adapted profile to generate a new task profile that represents the target worker’s future task needs at time T+ 1. The proposed collaborative topic variation inspection

process is described in detail in the second subsection of The Topic Variation Inspection Process (below).

Notations

The notations used in this work are deﬁned in Table 1. Measuring Variations in Topic-Needs Over Time

In this section we describe the proposed approach for mea-suring the variations in a worker’s topic needs over time, i.e., the information needs reﬂected in the topic taxonomy. Capturing Workers’ Relevance Feedback Behavior

When a worker accesses a document, the proposed sys-tem stores the information about the corresponding relevance feedback ( judgment) behavior. Information about a worker’s feedback behavior, i.e., the behavior related to documents, is gathered by our system’s online user behavior tracker. The following example explains how the system captures and stores a worker’s access behavior patterns.

Example: Assume that a worker, “Steven,” searches for documents in the K-Support system, and ﬁnds a document entitled “Learning User Interest Dynamics with a Three-Descriptor Representation” that may help him with his current task. Steven “reads” the document on “2005-10-31” at “21:05:03” and the system records this information.

(7)

TABLE 1. Notations used in this work.

Notation Deﬁnition Section

T The index of the time that the latest event occurred Section Measuring Variations of a Speciﬁc Topic TWt,T The time weight of the event that occurred at timet with

respect to timeT

Section Measuring Variations of a Speciﬁc Topic

NWi

t The topic-need weight of topici for the event that occurred

at timet

Section Measuring Variations of a Speciﬁc Topic

NVi

d,e The variation in the worker’s information needs for a

speciﬁc topici between time e and time d( d< e)

Section Measuring Variations of a Speciﬁc Topic −−−−→

profileT +1 The personal proﬁle generated at timeT Section Phase One: Personal Topic Variation Inspection Process

ua/ux The target worker / similar worker Section Phase Two: Collaborative Topic Variation Inspection Process

SimT(ua, ux) The similarity between the target workeruaand the workerux

at timeT , obtained from the SimScore(ux)

Section Phase Two: Collaborative Topic Variation Inspection Process

δV/δD The tuning parameters used to adjust the relative weights of

the self-adapted proﬁle and the collaborative proﬁle

Section Information Needs Modeling Based on the Topic Variation Inspection Process

Tux The latest time index of the similar candidate variation

matrix WMT_(u

x) of worker ux

NVi

Tux,Tux+1(ux)

The variation degree of topici between time Tux and time

Tux + 1 of ux

−→ docT

ux+1(ux) The document proﬁle of a document, doc, accessed at time Tux + 1 by the worker ux

Sections Phase One: Personal Topic Variation Inspection Process & Information Needs Modeling Based on the Topic Variation Inspection Process

−−→

topici The topic proﬁle of topici in the topic taxonomy Sections Phase One: Personal Topic Variation Inspection Process &

Information Needs Modeling Based on the Topic Variation Inspection Process

In the above example, the stored information is {“Steven,” “2005-10-31,” “21:05:03,” “reads,” “Learning User Interest Dynamics with a Three-Descriptor Representation”}. Each attribute, except the time attribute, is converted into an iden-tiﬁable number. Hereafter, we use event to denote the users’ actions when they access a document. In this paper we only consider four kinds of events, namely, “downloading doc-uments,” “downloading reports of docdoc-uments,” “reading documents online,” and “uploading documents.” Workers upload documents that they regard as relevant or helpful to their research topics. They may also read and download documents or download notes about documents that are of personal interest.

Measuring Variations of a Speciﬁc Topic

We consider two factors when measuring the variation in information needs for a specific topic i in the topic tax-onomy for each event that occurs at time t. The first is the time factor, called the time weight TW_i,T; and the second is the relevance degree of topic i, called the topic-need weight NWi_t. In the following we explain the concepts of time weight and topic-need weight over time. Note that TW_i,T is the time weight of an event that occurred at time t with respect to time T, as described in Equation 5.2; and NWi_t is the topic-need weight of topic i for the event that occurred at time t. The latter is obtained by calculating the similarity (using a vector-based cosine method) between the document accessed at time t and the profile of topic i. When a worker accesses a document, the system calculates the topic-need weight for the corresponding event.

After obtaining the task-need weights for all of the worker’s events, the variation in topic needs over time can be measured, as shown in Equation 4.1. Given two timepoints, d and e (where d< e), let NVi

d,edenote the variation in the worker’s information needs for a speciﬁc topic i between time d and time e. The accumulated topic needs at time e equal the summation of TWt,e× NWitfor t= 1 to e.

NV_d,ei = e t=1 TWt,e× NWti− d t=1 TWt,d× NWti (4.1) The relevance degree of topic i is different at time d and time e; thus, NWi_t is exploited to take this factor into con-sideration. The time weight is used to reﬂect the effect of time decay on topic needs between time d and e. Since the measurement considers accumulated topic needs over time, the events that occurred before time d and time e are also considered.

A Representative Model

A representative model, i.e., a vector-based model, is defined to represent the variations in a worker’s topic-needs over time. Such variations are expressed as a time-period in a topics matrix, i.e., a topic-needs variation matrix comprised of several topic-needs variation vectors. Let NVd,e, defined in Equation 4.2, denote the variation vector of the worker’s topic needs between time e and d. The measurement of the variation of a specific topic i, i.e., NVi_d,e, is defined in Equation 4.1.

(8)

topic 5 topic 4 topic 3 topic 2 topic 1 2003 - 10 - 14 14 : 49 : 30 2003 - 10 - 13 14 : 25 : 00 2003 - 10 - 13 14 : 18 : 00 2003 - 10 - 13 14 : 13 : 00 2003 - 10 - 07 18 : 25 : 42 2003 - 09 - 24 09 : 57 : 00 0.064 0.044 0.080 0.078 0.065 0.033 0.109 0.111 0.074 ⫺0.066 0.019 0.036 0.031 0.014 ⫺0.024 ⫺0.019 0.015 0.026 0.328 ⫺0.013 0.05 0.04 0.06 0.447 0.066

FIG. 2. Example of a topic-needs variation matrix (VM).

Variations in topic needs between consecutive timepoints are expressed as a time-period by the topic matrix VM. An element VMp,iin the matrix represents the variation of topic i during time-period p (e.g., from time d to e). A row in the matrix, VM[j], denotes a variation vector of topic needs. Figure 2 shows an example of a topic-needs variation matrix. Example: The variation matrix shown in Figure 2 is a 5× 5 matrix. The variations in topic needs represented by the matrix cover the period 2003-09-24 (09:57:00) to 2003-10-14 (14:49:30), and the value of each element in the matrix represents the variation of the corresponding topic. For example, the value 0.447 represents the variation in topic needs for topic 1 from 2003-10-07 (18:25:42) to 10-13 (14:13:00). Let t1 denote the timepoint 2003-09-24 09:57:00, and let t2 denote the timepoint, 2003-10-07 18:25:42. In this case, NV1_t1,t2= 0.066, which represents the variation in topic needs for topic 1 between t1 and t2; and NVt1,t2= 0.066, −0.013, −0.024, −0.066, 0.065, which represents the variations in topic-needs for all topics between t1 to t2. The variations in topic needs over time are represented as set of topic-needs variation vectors.

The Topic Variation Inspection Process

In this section we describe the collaborative topic vari-ation inspection process, which adjusts task profiles based on individual and collective search behavior. The two phases of profile adaptation, which are based on collaboration, are discussed in the following two subsections. In addition, we explain how we integrate the derived collaborative profile with the personal profile to predict the target worker’s task needs.

Phase One: Personal Topic Variation Inspection Process When the system detects an event related to a worker’s access behavior, it captures and records the document accessed by the worker. The event triggers the per-sonal profile adaptation process, which adjusts the worker’s task profile based on the corresponding event. A modified IRF algorithm, adapted from the techniques applied in the Ide_Dec_Hi algorithm, is used to adjust the workers’ task profiles. The proposed profiling technique is defined in Equa-tions 5.1 and 5.2. Let T denote the timepoint that the worker accessed the last document; and let−−−−→profileT +1 denote the worker’s task profile generated at time T, which can be used

to model their task-needs at time T+ 1. −−−−→

profileT +1= α×Decay(−−−−→profileT)+[λ−−→topicT+(1−λ)−→docT] (5.1) The task profile−−−−→profileT +1 is generated from previous task profile −−−−→profileT applied with a decay function, and refined by using the current information needs derived from the document accessed at time T. The current information needs are divided into a document profile, −→docT, and an aggregate topic profile,−−→topicT. Intuitively,−→docT is the pro-file (feature vector) of the document accessed at time T. The relevance degree of a topic i to the document accessed at time T is obtained by calculating the similarity (cosine mea-sure) between−−→topiciand−→docT. The−−→topicT profile is derived from the topic profiles of relevant topics in the positive topic set, as well as nonrelevant topics in the negative topic set. We use a parameter,λ, to adjust the weights of the document profile and the aggregate topic profile, as shown in Figure 3.

Decay(−−−−→profileT) = T −1 t=1 TWt,T × [λ−−→topict + (1 − λ)−→doct] where Time Weight:TWt,T =

the actual time fort − ST the actual time forT − ST

(5.2) Decay(−−−−→profileT) represents the accumulated task needs from the beginning of the task to the current time T. Thus, in this work we incorporate the time decay function of the pre-vious task-profile, as given in Equation 5.2, where−−−−→profileT denotes the previous task profile generated at time T− 1, and represents the previous task needs. Specifically,−−−−→profileT is the aggregate of topic profiles and document profiles derived from the starting time ST to T− 1. Generally, the more recently a document was accessed, the more important it should be in reflecting the worker’s current task needs. TWt,T is the time weight of an event that occurred at time t with respect to T, and is defined as the ratio of the time difference t− ST to T − ST. Thus, Decay(−−−−→profileT) reflects the effect of the previous task profile on the current task profile more accurately with TW than just using−−−−→profileT. Accordingly, to learn the users’ dynamic information needs we propose three

(9)

Task initialization

Time

Event 1 Event 2 Event n-1 Event n

Worker D0000000457 Event 3 D0000000046 Root Pos. Topics Neg. Topics Topic 11, Topic 4, Topic 6 Topic 7 Changes of Topic-needs Root Pos. Topics Neg. Topics Topic 2, Topic 7, Topic 5 Topic 4

FIG. 3. Example of modeling a worker’s task-needs.

self-proﬁle adaptation methods that consider the topic varia-tion factor and the time factor. We call the methods P-Time, P-Topic, and P-Topic&Time.

Phase Two: Collaborative Topic Variation Inspection Process

The variations in each worker’s topics-needs can be rep-resented by a topic-needs variation matrix (VM), as shown in Figure 2. Specifically, the variations in each worker’s topic needs over time are expressed as a time-period by a topic matrix based on topics in the taxonomy, i.e., a topic-needs variation matrix comprised of several topic-needs variation vectors. As a result, workers with similar topic-needs can be identified by their topic-need variation matrices and per-sonal profiles in a time window. That is, to identify similar workers, we consider workers with similar variations in topic needs and similar personal profiles simultaneously. Figure 4 shows the proposed algorithm for identifying workers with similar task needs. We explain the algorithm in detail in the following.

First, the variation matrix VMT(ua) of the target worker is trimmed to a w∗ q variation matrix WMT(u_a), which only contains the latest w variation vectors used to identify work-ers with similar variation matrices and task proﬁles. For each compared worker u_x, a sliding window W is used to locate the part of the variation matrix of u_xthat is similar to WMT(u_a). WMT(u_x) is the variation matrix of u_xgenerated according to VMT(u_x) and the sliding window W, and T is the lat-est time index in the window W. Note that W denotes the sliding window whose size is w, where w≤ row(VMT(ua)), and q represents the number of topics in the topic taxon-omy. An example of the latest w variation vectors of the trimmed variation matrix is shown in Figure 5. In this case, the size of the window, w, is equal to four. The proposed algorithm tries to capture variations in the target worker’s topic-needs for the time period that is closest to the latest time index of the worker’s search activities. We believe that

some of the workers who perform similar tasks will have similar variations in topic-needs during that time period.

We employ a trimmed matrix instead of a complete varia-tion matrix because it is not easy to find workers with similar changes in topic-needs for the whole task in the long term. In general, users only have similar topic-needs for a short period of time; therefore, we set a time window to make com-parisons among the workers. In addition, the computational cost of comparing the target worker’s matrix with those of the other workers would be prohibitive. In Figure 4, lines 3–21 describe the procedure for finding candidate work-ers with similar topic-needs based on the candidate vari-ation matrix for each compared worker u_x. The candidate variation matrix with the highest similarity score among all the candidates of u_xis selected as the most similar variation matrix to that of ux. The calculations (lines 16–19) of the similarity SimScore of uxand uaare divided into two parts: 1) calculation of the similarity SimVM based on the topic-needs variation vectors; and 2) calculation of the similarity SimTP based on the personal profiles. A parameterη is used to balance the relative importance of SimVM and SimTP. In our application, we setη = 1/2; that is, the similarity scores of the variation vectors and the personal profiles are equal. The workers with the top-N ranked similarity scores are selected as the similar workers of u_a. The value of N should be set according to the application domain. Figure 5a,b illustrate, respectively, the calculation of the similarity scores based on the variation matrices and task profiles in the sliding window.

Information Needs Modeling Based on the Topic Variation Inspection Process

After identifying workers similar to the target worker, we use their variation matrixes, i.e., similar candidate variation matrixes determined by the algorithm, can be used to pre-dict the target worker’s potential task needs, as shown in Equations 5.3 and 5.4.

(10)

Input:: VMT_(u a), W

Output:: SimilarWorkerList // the list of similar workers function FindSimilarWorker(VMT_(u

a), W){

1 Trim VMT(u_a) to a w * q variation matrix WMT(u_a) // which only keeps the last w variation vectors; 2 foreach compared worker ux{

3 Set SimScore(ux)= 0

4 Slide the window W on VMT_(u

x) to derive WMT(ux)

from row 1 to row (row(MT

y)-w+ 1), do {

5 Let WMT(ux) be the variation matrix of uxgenerated based on VMT(ux) and the sliding window W, when W is

moving on VMT(ux); Tis the latest time index in the window W

6 for the variation matrix WMT_(u

x) covered by W, do{

7 Set SimVM= 0, SimTP = 0

8 foreach variation vector WMT(u_x)[j] of WMT(u_x) do {

Let WMT(ua)[j] be the corresponding variation vector of MT(ua)

9 SimVM= SimVM + simlarity(WMT_(u

x)[j], WMT(ua)[j])

10 }

11 SimVM= SimVM / w

12 foreach personal proﬁle TP(u_x)[k] involved in WMT_(u x) do{

Let TP(u_a)[k] be the corresponding task proﬁle of WMT(u_a) 13 SimTP= SimTP + simlarity(TP(u_x)[k], TP(u_a)[k])

14 }

15 SimTP= SimTP / (w + 1)

16 if ((η ∗ SimVM + (1 − η) ∗ SimTP) > SimScore(ux)) then{

17 SimScore(ux)= η ∗ SimVM + (1 − η) ∗ SimTP

18 Set WMT_(u

x) as the candidate (similar) variation matrix of ux

19 }

20 }

21 }

22 }

23 Add the workers with top-N SimScore to SimilarWorkerList; 24 return SimilarWorkerList;

25 }

FIG. 4. The algorithm for identifying similar workers.

FIG. 5. (a) Example of calculating SIMV (average value of the similarity of variation vectors). (b) Example of calculating SIMTP (average value of the similarity of self-task proﬁles).

Note that time T_ux is the latest timepoint in the presented candidate variation matrix. The variation vectors immedi-ately after time T_ux in the matrix can be regarded as the possible changes in topic needs that the target worker will experience in the near future. We propose two approaches for predicting the worker’s potential task needs based on the behavior patterns of similar workers at time T_ux + 1.

The ﬁrst, called the Coll_Topic Variation approach, is based on the variation in topic needs. It generates a collaborative proﬁle in which variations in the topic needs of similar work-ers from time T_ux to T_ux + 1 are used to predict possible variations in the target worker’s topic needs at time T+ 1. The second method, called the Coll_Document method, is based on the documents accessed at time T_ux + 1. In this

(11)

case, the documents that similar workers accessed at time T_ux + 1 are used to derive the collaborative profile. The linear combination approach detailed below is used to integrate the derived collaborative profile with the personal profile to gen-erate a new task profile−−−−−−−→coll_profileT +1, which represents the target worker’s future task needs.

Coll_Topic Variation Method (5.3)

×

ux∈Similar Worker Set of ua

(SimT(ua, ux) × q i=1NV i Tux,Tux+1(ux) × −−→ topici)

SimT(ua, ux)

Coll_Document Method (5.4)

×

(SimT(ua, ux) ×−→docTux +1(ux))

SimT(ua, ux)

In the Coll_Topic Variation method, the predicted pro-file of the target worker is the weighted combination of the personal profile, i.e.,−−−−→profileT +1in Equation 5.1 and the accu-mulated collaborative topic-variation profile derived from variations in the topic needs of similar workers. Here, we use a parameterδV/δDto adjust the relative weights of the self-adapted personal profile and the collaborative profile. Each worker’s topic-variation profile is obtained by multiplying the profiles of topics by the corresponding variation degrees NV_Ti

ux,Tux +1(ux). The individual topic variation profile rep-resents the variation in the topic needs of the corresponding worker; while the collaborative topic-variation profile rep-resents similar workers’ weighted topic-variation profiles in terms of their similarity to that of the target worker. The differ-ence between the Coll_Document method and the Coll_Topic Variation method is that the documents accessed at time Tux + 1 by similar workers are used to derive the collaborative document profile.

Experiments

We conducted three experiments to evaluate the effective-ness of the proposed information needs leaning methods. The ﬁrst subsection provides an overview of the K-Support system and the experiment setup; the second subsection describes the evaluation metrics; and the third subsection details the experiment procedure and methods.

Experiment Setup

We conducted experiments in a real application domain used for research tasks in a research institute’s laboratory. For this investigation context we developed a task-based K-Support portal to deliver relevant documents based on the user’s work task and their current information needs. We describe the proposed “K-Support” system below.

Overview of the K-Support system. In task-based envi-ronments, codified knowledge and human resources are important knowledge assets that can be used to accomplish organizational tasks. The K-Support portal is a Web-based application that allows workers to retrieve, organize, and share task-relevant documents (Liu & Wu, 2008). The system architecture comprises four implementation layers: knowl-edge resource collection, knowlknowl-edge acquisition, knowlknowl-edge modeling, and a Web-based front-end application. A user can log into the system to search for or share a task-relevant doc-ument. In the knowledge acquisition layer, the user behavior tracker and log-parsing engine analyze log-files to track the user’s interaction with the system. We do not ask users to pro-vide feedback for every document. The system can monitor the user’s search behavior and the user’s document feedback behavior patterns are gathered by the back-end user behavior tracker. That is, we observe the users’ natural browsing, read-ing, and access behavior patterns instead of instructing them to follow specific steps in the proposed portal. More details about the task-based K-Support system can be found in the above-mentioned work.

To evaluate the approach proposed in this paper, we use a system with four modules: a user behavior tracker, a domain topic taxonomy, an information-needs variation inspector, and a proﬁle handler. The developed system is an extension of the framework of our previously proposed K-Support system, i.e., we have added the information-needs variation inspec-tor module. To learn a worker’s dynamic information needs, the information-needs variation inspector is designed to fully utilize the information about the work context, i.e., the search behavior of similar workers and changes in the domain top-ics. The framework can be integrated with a KMS or project management system to design the information retrieval func-tion. In addition, the proposed approach can be generalized via the presented framework to support workers’ information seeking and retrieval activities when executing knowledge-intensive tasks. In the following, we discuss the study setting, dataset, and evaluation metrics.

Study setting and dataset. This work extends the previous framework to improve the most important functions, i.e., the knowledge retrieval functions, based on the user’s work task and their current information needs. We chose tasks performed in the Department of Information Management of a major Taiwanese university for evaluation. The tasks included system development, thesis writing, and project sur-veys, all of which can be regarded as knowledge-intensive tasks. The subjects were graduate students who were engaged in different tasks. Four research issues were selected as the evaluation targets, namely, information technology ser-vice management, patent analysis for business intelligence, product recommendations, and knowledge management sys-tems. The students needed to access documents for a speciﬁc task in the proposed digital workspace for use in a regular weekly meeting held in the research institute. As the sub-jects needed to upload between one and three documents that

(12)

were relevant to their research every week, we assumed that the system could track changes in each worker’s topics of interest during the task’s performance. Each research project issue covered several related research topics. Although our evaluation tasks were implemented in the same department, they belonged to different projects with their own research topics. Twelve subjects were selected as test workers for the evaluation. The evaluation period for each target task was determined by experts who evaluated the proposed methods. We sampled the evaluation subjects based on certain criteria, one of which was the problem stage of the long-term task that the worker was in. As each subject was in a different problem stage (i.e., cognitive state), the size of the dataset and the number of participants were restricted in the experi-ments. Basically, we followed Kuhlthau’s (1993) information search process model (ISP model) and Vakkari et al.’s task-based information-seeking theories (2000, 2003). Some of the empirical longitudinal studies conducted by these authors show that users’ information needs vary in different stages. This factor motivated the current research, as mentioned in the Introduction. In this work we divide a user’s search process into three stages: the pre-focus, focus formulation, and post-focus stages. Knowing the worker’s current problem stage while conducting the experiments can help us explain the results of topic variation, which are detailed in Experiment Results (below).

Evaluation Metrics

The goal of the experiments is to evaluate the performance of the proposed information needs learning model via the topic variation inspection process. The IR evaluation method-ology focuses on the evaluation of quantitative or qualitative data (Chen, Fan, Chau, & Zeng, 2001). Retrieval effective-ness is the most commonly used criterion for quantitative evaluation, and the effectiveness of information retrieval is normally measured by the precision and recall rates (Chen et al., 2001; Croft, 1995; Salton & McGill, 1983). On the other hand, qualitative evaluation of an IR system can be based on the analysis of questionnaires that request informa-tion about various evaluainforma-tion items, such as user satisfacinforma-tion, usability, and learning ability. Qualitative evaluation is much more suitable for evaluating the effectiveness of users’ inter-active search activities. Our evaluation method focuses on the retrieval results. We compare the performance of various methods in terms of their retrieval effectiveness. Speciﬁcally, we use the precision rate, recall rate, and F-measure to com-pare the methods (Rijsbergen, 1979; Salton & McGill, 1983; Witten et al., 1999).

Precision and recall. Precision is the fraction of retrieved documents that are relevant, while recall is the fraction of known relevant documents that are retrieved. To calculate the precision and recall rates, we asked domain experts and experienced workers to manually label documents that were highly relevant for each task. Although this is very time-consuming, it ensures the quality of our answer set for

each evaluation task. The precision rate for an evaluation task e_r is the ratio of the total number of relevant docu-ments retrieved to the number of top-N support docudocu-ments in the presented system. The recall rate for an evaluation task eris the ratio of the total number of relevant documents retrieved to the total number of relevant documents speciﬁed by experts.

F-measure. To assess the relative importance of the preci-sion and recall rates, a combination metric, the F-measure (Rijsbergen, 1979; Witten et al., 1999), is used to adjust the relative weights of precision and recall to ﬁnd a trade-off between the two metrics. The function ofβ is to adjust the importance of the recall rate relative to that of the pre-cision rate. If β = 0, Fβ coincides with precision, and if β = ∞, Fβ coincides with recall. To compare the methods in this experiment, we consider the precision (β = 0), recall (where β =∞), and F-measure (β = 0.5), i.e., precision is more important than recall.

F − measure =(1 + β

2_{) × precision × recall}

β2_{× precision + recall} (6.1) In this work, finding relevant documents based on a few retrieval results is much more important than finding all rel-evant documents. Therefore, we emphasize precision more than recall because we want to determine which method is better able to reject extraneous documents rather than find all relevant documents (Salton & Buckley, 1988). Moreover, the higher precision rate (i.e.,β = 0 in the F-measure) reflected in the experiment results shows that the proposed methods are suitable for interactive work environments where workers are under pressure to find task-relevant documents and do not have time to review a large number of retrieved documents.

Experiment Procedure and Methods

As mentioned above and illustrated in Figure 1, the model uses an event-based technique (i.e., an event occurs when a worker accesses a document at a specific time) to trig-ger the information needs learning and profile adaptation process. Recall that we did not ask the subjects to provide explicit feedback on the documents. The system can monitor and record a user’s search behavior with the back-end user behavior tracker. When a user logs into the system for a spe-cific work task, four types of event are recorded, namely, “download document,” “download reports of documents,” “read document online,” and “upload document.” With the proposed topic variation inspection method, the system can deliver task-relevant documents based on the learning results. The objective of the three experiments was to com-pare the effectiveness of the proposed adaptive informa-tion needs learning methods, namely, the P-Time, P-Topic, P-Topic&Time, Coll_Topic Variation, and Coll_Document methods, with that of the baseline method, S_P (primitive self-profiling). Details of each method are given in Table 2.

(13)

In Experiment 1 we evaluate the effectiveness of the base-line method and three of the proposed methods for proﬁle adaptation via the personal topic variation inspection pro-cess, as discussed in Phase 1 of the proposed methodology (see above). The baseline method is the traditional relevance feedback (RF) technique, which is widely used in IF studies to determine users’ dynamic interests, preferences, and infor-mation needs. Most studies in this area adopt the Rocchio method as the baseline to compare the performance of a proposed method (Salton & Buckley, 1990; Widyantoro & Yen, 2005; Yang et al., 2005). We refer to the classical rel-evance feedback method proposed by Rocchio (1971) and the Ide method (1971) for query reformulation, as formu-lated in Equations 2.2 and 2.3. The S_P method is similar to the Rocchio method, except that the nonrelevant feedback part of the equation is removed because most studies suggest that information about relevant documents is more important than the content of nonrelevant documents (Salton & McGill, 1983; Salton & Buckley, 1990; Liu & Wu, 2008).

−−−−→

profileT +1= α−−−−→profileT+ [λ−−→topicT+ (1 − λ)−→docT] (6.2) −−−−→

profileT +1= αDecay(−−−−→profileT) +[λ−−→topicT+(1−λ)−→docT] (6.3) Basically, traditional information needs learning methods rely on collecting users’ feedback on items, i.e., Webpages, texts, and products. They do not analyze users’ information on topics, the rate of topic changes, and the time effects, which may influence the information needs learning results. Therefore, to learn the users’ dynamic information needs, we propose three self-profile adaptation methods that consider the topic variation factor, and the time factor. The methods are called P-Time, P-Topic, and P-Topic & Time. As men-tioned above, we consider the effect of the time factor and the user’s behavior (documents accessed) to adjust the cor-responding task profile with the aid of a task-based topic taxonomy. The P-Time and P-Topic methods are similar to the S_P method, but they consider the effect of the time factor and topic profiles, respectively. The P-Topic & Time method adjusts a worker’s task profile based on the docu-ments accessed by the worker and their relevance to the topic taxonomy, as mentioned in the section Phase One: Personal Topic Variation Inspection Process (above). The effect of the time factor is also incorporated into the profile adaptation process. The methods are formulated in Equations 6.2 and 6.3. In Equation 6.2, when the parameter λ is set to 0, it is the baseline method, i.e., the S_P method; otherwise it is the P-Topic method. In Equation 6.3, when the parameter λ is set to 0, it is the Time method; otherwise it is the P-Topic & Time method. Note that in each equationα is set to 1 during the experiment. Experiments 2 and 3 evaluate the effectiveness of the method for profile adaptation via the col-laborative topic variation inspection process. As mentioned above, a worker’s information needs can be learned via a personal profile and a collaborative profile (topic-variation

profile). The Coll_Topic Variation method uses the collabo-rative profile adaptation technique defined in Equation 5.3. It adjusts the task profile by a weighted combination of the personal profile and the collaborative profile derived from similar workers. The Coll_Document method is similar to the Coll_Topic Variation method, except that the documents accessed at timeT_u_x+ 1 by similar workers are used to derive the collaborative profile, as shown in the Equation 5.4. In Experiment 2 we evaluate the parameters selected for two collaborative topic variation inspection methods. The param-etersδV in Equation 5.3 andδDin Equation 5.4 are used to adjust the relative weights of the personal profile and the col-laborative profile, respectively, in the Coll_Topic Variation and Coll_Document methods. From the values of the two parameters determined in Experiment 2, we select the best values for use in our application domain. Finally, in Experi-ment 3 we compare four methods (see Table 2) to demonstrate the effectiveness of the information learning method based on the collaborative topic variation inspection process.

Experiment Results

Experiment One: Effects of Proﬁle Adaptation via Self-Topic Variation Inspection

This experiment compares the performance of four meth-ods, namely, S_P, P-Time, P-Topic, and P-Topic & Time under various numbers of top-N support documents. The S_P (primitive profiling) method, which is the baseline, learns a user’s current information needs from feedback about the recommended information (i.e., documents), and updates the user model for future information filtering. The method only considers a worker’s implicit feedback on documents. In con-trast, the P-Time and P-Topic methods consider the time factor and topic profiles, respectively. The P-Topic & Time method is a self-profile adaptation method that adjusts the task profile by considering the document profile, the relevant-topic profiles and the time effect simultaneously. Table 3 shows the performances of the four methods in terms of pre-cision, recall, and the F-measure under various numbers of top-N documents. Since we want to determine if the system can learn the user’s dynamic information needs effectively via the proposed method, we place more emphasis on the precision metric than the recall metric. For the F-measure metric, we set β = 0.5, (see Equation 6.1) to show the rel-ative importance of precision and recall in order to achieve a trade-off between the two metrics. In addition, we conduct statistical tests to determine whether the observed differences are statistically significant.

Observation 1: Table 3 shows the precision, recall, and F-measure scores under various numbers of top-N docu-ments. Clearly, the P-Topic & Time method outperforms the other three methods in each scenario. Figure 6 shows the aver-age precision scores for the four methods. Again, the baseline S_P method yields the least effective performance under various numbers of Top-N support documents. This result demonstrates that considering topic proﬁles and the time