• 沒有找到結果。

Document recommendation for knowledge sharing in personal folder environments

N/A
N/A
Protected

Academic year: 2021

Share "Document recommendation for knowledge sharing in personal folder environments"

Copied!
12
0
0

加載中.... (立即查看全文)

全文

(1)

Document recommendation for knowledge sharing in personal

folder environments

Duen-Ren Liu

*

, Chin-Hui Lai, Chiu-Wen Huang

Institute of Information Management, National Chiao Tung University, Hsinchu 300, Taiwan Received 17 October 2006; received in revised form 12 October 2007; accepted 27 October 2007

Available online 17 November 2007

Abstract

Sharing sustainable and valuable knowledge among knowledge workers is a fundamental aspect of knowledge management. In orga-nizations, knowledge workers usually have personal folders in which they organize and store needed codified knowledge (textual docu-ments) in categories. In such personal folder environments, providing knowledge workers with needed knowledge from other workers’ folders is important because it increases the workers’ productivity and the possibility of reusing and sharing knowledge. Conventional recommendation methods can be used to recommend relevant documents to workers; however, those methods recommend knowledge items without considering whether the items are assigned to the appropriate category in the target user’s personal folders. In this paper, we propose novel document recommendation methods, including content-based filtering and categorization, collaborative filtering and categorization, and hybrid methods, which integrate text categorization techniques, to recommend documents to target worker’s person-alized categories. Our experiment results show that the hybrid methods outperform the pure content-based and the collaborative filtering and categorization methods. The proposed methods not only proactively notify knowledge workers about relevant documents held by their peers, but also facilitate push-mode knowledge sharing.

Ó 2007 Elsevier Inc. All rights reserved.

Keywords: Document recommendation; Knowledge management; Personal folder; Knowledge sharing; Text classification

1. Introduction

The rapid emergence of Knowledge Management in recent years has played a key role in helping organizations gain and maintain a competitive advantage. Sharing sus-tainable and valuable knowledge among knowledge work-ers is a fundamental aspect of knowledge management. Organizational knowledge and expertise are usually codi-fied into textual documents, including forms, letters, papers, manuals and reports, to facilitate knowledge capture, searching, and sharing (Nonaka, 1994).

Knowledge workers tend to keep their codified knowl-edge in personal folders. Textual documents stored in each worker’s personal folder are usually organized into

catego-ries. In such personal folder environments, providing knowledge workers with needed knowledge from other workers’ folders is important to facilitate knowledge shar-ing. Although conventional knowledge management sys-tems (KMS) provide a search function to help workers find needed knowledge, very few KMS address the issue of proactively providing workers with needed knowledge in personal folder environments. Recommender systems can be adopted to provide an effective means of addressing this shortcoming of KMS.

Conventional application domains of recommender sys-tems cover areas such as ‘‘Music”, ‘‘Movie” and ‘‘Product” recommendations. Various recommendation methods have been proposed for such systems (Breese et al., 1998; Burke, 2002; Li and Kim, 2003; Liu and Shih, 2005). For example, Content-based Filtering (CBF) utilizes users’ profiles to determine recommendations for target users. In

applica-tions that recommend documents, CBF provides

0164-1212/$ - see front matterÓ 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2007.10.027

*

Corresponding author. Fax: +886 3 5723792. E-mail address:dliu@iim.nctu.edu.tw(D.-R. Liu).

www.elsevier.com/locate/jss The Journal of Systems and Software 81 (2008) 1377–1388

(2)

recommendations by matching user profiles (e.g., interests) with content features (e.g., feature vectors of documents). Each user profile is derived by analyzing the content fea-tures of documents accessed by the user. Collaborative Fil-tering (CF), which assumes that items from similar (like-minded) users are often relevant, utilizes preference ratings given by the users to determine recommendations made to a target user. Hybrid recommender systems integrate con-tent-based and collaborative filtering to enhance the qual-ity of recommendations.

The LIBRA system (Mooney and Roy, 2000) is an example of a content-based filtering system that recom-mends books based on information extracted from Web pages. Meanwhile, Siteseer (Rucker and Polanco, 1997) uses collaborative filtering to provide Web page recommen-dations based on the folders of bookmarks. However, nei-ther method considers recommending Web pages to appropriate categories. Knowledge Pump (Glance et al., 1998) classifies documents into a commonly agreed classifi-cation scheme based on the content of documents. How-ever, the classification is a commonly agreed classification scheme, rather than a personalized one. RAAP (Delgado et al., 1998) is an example of a hybrid system developed to recommend a user’s newly classified bookmark (docu-ment) to other users with similar interests. A common cat-egory schema, rather than a personalized one, is predefined for all users to support classification.

Conventional document recommender systems generally assume a common category schema without considering personalized categories. Since both the source and the tar-get user have the same category schema, such recommender systems are simplified to recommending documents to the target user without considering which category the docu-ment belongs to. Although the Siteseer system (Rucker and Polanco, 1997) considers the personalized folders of bookmarks, it simply takes one specific folder (category) of the target user at a time as the target for recommenda-tion, and does not address the issue of recommending items to the target user’s appropriate categories. In this paper, we investigate the issue of recommending textual documents to appropriate categories in personal folder environments. Each knowledge worker has a personal folder for storing documents in user-defined categories. In personal folder environments, knowledge workers can define their own cat-egories, so the recommender system also needs to consider the appropriate category for a recommended document. Generally, text categorization techniques (Langari and Tompa, 2001; Larkey and Croft, 1996) can be used to allo-cate documents to appropriate allo-categories. We propose novel recommendation methods that incorporate text cate-gorization techniques to recommend documents to the appropriate categories of a target worker’s personal fold-ers. Several novel methods have been proposed for this pur-pose, including content-based filtering and categorization, collaborative filtering and categorization, and hybrid meth-ods. The proposed methods can proactively provide knowl-edge workers with needed textual documents from other

workers folders. Experiments are conducted to evaluate the performance of various methods using data collected from a research institute laboratory. The experiment results show that the hybrid methods outperform the other methods.

The remainder of the paper is organized as follows. Section2 reviews the background of this study, including knowledge management, information retrieval, text categorization, and recommender systems. Our proposed method is described in Section 3. Section 4 evaluates the performance of our methods. Finally, Section 5 presents our conclusions and indicates the direction of our future work.

2. Background and related work

In this section, we describe the basic concepts of our research, including knowledge management, information filtering and retrieval, text categorization, and recom-mender systems.

2.1. Knowledge management

Knowledge management is a systematic process of gathering, organizing, sharing, and analyzing knowledge in terms of resources, documents, and people skills within and across an organization (Davenport and Prusak, 1998; Nonaka, 1994). Textual data, such as articles, reports, manuals, and know-how documents are treated as valuable and explicit knowledge; thus, effective document manage-ment is especially important (Nonaka, 1994). Generally, existing knowledge management systems adopt codified approaches (Zack, 1999) or social network dialog (Agostini et al., 2003) to facilitate knowledge-sharing and support. 2.2. Information retrieval and information filtering

Information retrieval (IR) deals with the representation, organization, storage, and access to information items (Baeza-Yates and Ribeiro-Neto, 1999). Essentially, IR focuses on searching for and indexing a large number of documents and then presenting users with data that meets their information needs. One popular IR method uses a vector model, which assigns non-binary weights to index the most discriminating terms in documents based on the tf–idf approach (Salton and Buckley, 1988; Baeza-Yates and Ribeiro-Neto, 1999), where terms with a higher fre-quency in one document and a lower frefre-quency in other documents are better discriminators for representing the terms of the document. In the tf–idf approach, tf denotes the occurrence frequency of a particular term in a docu-ment, while idf denotes the inverse document frequency of a particular term measured by log2(N/n + 1), where N

is the number of documents in the collection, and n is the number of documents in which term i occurs at least once. The weight of a term, i, in a document, j, is expressed as follows:

(3)

wi;j¼ tfi;j idfi¼ tfi;j log2

N n þ 1

 

; ð1Þ

where tfi,jis the frequency of term i in document j, and idfi

is the inverse document frequency of term i.

Information filtering helps maintain users’ personal files by separating relevant and irrelevant documents based on their individual profiles. In this way, only useful informa-tion is sent to the user (Baeza-Yates and Ribeiro-Neto, 1999; Chen and Kuo, 2000; Shapira et al., 1999).

2.3. Text categorization

Text categorization or text classification assigns cate-gory or class labels to new documents automatically ( Lan-gari and Tompa, 2001; Lewis and Ringuette, 1994; Larkey and Croft, 1996). Two kinds of text categorization, namely k-NN and category vector methods are widely used ( Lan-gari and Tompa, 2001). The k-nearest neighbor method (k-NN) tries to find the top-k documents that are most sim-ilar to the target (unlabeled) document, and then assigns the target document to the category that has the majority of k-nearest neighbors. Each document can be represented as a term vector in the multi-dimensional vector space, where the weight of a term in a document is usually gener-ated by the tf–idf approach, introduced in Section2.2. For each unlabeled term vector, we use the cosine similarity measure to find the k nearest training term vectors. The cosine similarity measure is normally used to measure the degree of similarity between two items, x and y, by comput-ing the cosine value of the angle between their respective feature vectors, Q and R, as shown in Eq.(2). The degree of similarity is higher if the cosine value is close to 1. simðQ; RÞ ¼ cosineðQ; RÞ ¼ Q R

jQjjRj: ð2Þ

The category vector method, on the other hand, derives the term vector of each category by using the tf–idf or centroid approach based on labeled documents. The tf–idf approach uses a similar process to that described in Section2.2to de-rive the term vector of each category (Langari and Tompa, 2001). The centroid approach derives the term vector of a category crby averaging the term vectors of the documents

in that category, as shown in Eq.(3). Let Dcr denote the

document set of a categorycr; let wi,cr denote the weight

of a term i in cr; and let dwi,jdenote the weight of a term

i in a document j. Then, wi,cris derived as follows:

wi;cr¼ 1 jDcrj X dj2Dcr dwi;j: ð3Þ

The similarity of a category, cr, to an unlabeled document

dxis then calculated as simð~dx; ~crÞ using the cosine measure,

where ~dxis a document vector and ~cris the category vector.

According to the similarities between categories and unla-beled documents, we then classify the unlaunla-beled object by assigning it the label of the most similar category, or the

la-bels of the categories whose similarity is above a certain threshold.

2.4. Recommender systems

A recommender system helps users select items of inter-est from a huge stream of data. As mentioned earlier, three approaches can be used to develop recommender systems: Content-Based Filtering (CBF), Collaborative Filtering (CF), and Hybrid Filtering (Konstan et al., 1997).

Content-based recommender systems (Kamba et al., 1995; Woodruff et al., 2000) assume that if users liked certain items in the past, they will like similar items in the future. CBF systems obtain an item’s characteristics (product fea-tures) and compare them with the user’s profile to predict his/her preferences. Various techniques can be employed to compare and match item features with user profiles, the simplest of which is keyword matching (Claypool et al., 1999). Examples of CBF for text recommendation include the newsgroup filtering system NewsWeeder (Lang, 1995) and LIBRA (Mooney and Roy, 2000). The latter uses book information extracted from the web pages to learn a profile with weighted terms using a Bayesian text classifier. The pro-file is then used to predict the scores of the selected books and those with the top scores are recommended to users.

Collaborative filtering is based on the concept that if like-minded users like an item then the target user will probably like it as well (Breese et al., 1998). In other words, collaborative filtering systems consider the preferences of people who have the same or very similar interests to those of the target user. Well-known collaborative filtering sys-tems include GroupLens (Konstan et al., 1997), Ringo (Shardanand and Maes, 1995), Siteseer (Rucker and Polanco, 1997), and Knowledge Pump (Glance et al., 1998). Many systems apply a neighborhood-based algo-rithm to choose a group of users based on their similarity to the target user. A weighted aggregate of the user’s rat-ings is then used to generate predictions for the target user. The steps of the algorithm are as follows:

Step 1: Calculate the similarity between users by comput-ing the Pearson correlation or the cosine measure of the user vectors.

Step 2: To find the neighborhood of the target user, use either the threshold approach or the k-NN (nearest neighbor) approach to select k users that are the k most similar (ranked by similarity) to the active user. In this research we use k-NN approach. Step 3: Make a prediction based on the aggregated weights

of the selected k nearest neighbors’ ratings, as shown in Eq.(4):

Pu;j¼ ruþ

Pk

i¼1wðu; iÞðri;j riÞ

Pk

i¼1jwðu; iÞj

; ð4Þ

where Pu,jdenotes the prediction made about item

(4)

ratings of user u and user i, respectively; w(u, i) is the similarity between target user u and user i; ri,j

is the rating of user i for item j; and k is the num-ber of users in the neighborhood.

Collaborative filtering assumes that documents from like-minded users are often relevant, and therefore com-putes the preference ratings given by various users to make a list of recommendations. Siteseer (Rucker and Polanco, 1997) provides web-page recommendations based on folders containing bookmarks (Web-page URLs), giving preference to pages held in multiple folders in the neighbor-hood. Recommendations are made for each of the target user’s folders (categories of interests) as follows. A target user’s specific category of interest (folder) is used as the basis to form a virtual community of the target user. Users in the community are virtual neighbors of the target user and are selected based on the user-folder similarity, which is measured by the degree of overlap (such as common URLs) between the neighbor’s folder and the target user’s specific folder. Although Siteseer considers personalized folders of URLs, it does not recommend items (URL bookmarks) to appropriate categories. Instead, it simply takes one specific folder (category) of the target user at a time to make recommendations. In general, folders may have multiple levels with hierarchical relationships that form a hierarchy of categories. Neither our approach nor Siteseer utilizes the hierarchical relationships between fold-ers in the design of recommendation methods. Knowledge Pump (Glance et al., 1998) classifies documents into com-monly agreed categories based on the content of the docu-ments. Then, the CF technique is used to recommend documents based on the personal profiles of advisors – peo-ple whose opinions the user trusts. The classification scheme used in the recommender system is commonly agreed, rather than personalized.

Hybrid recommender systems combine content-based filtering and collaborative filtering to improve the accuracy of recommendations. Two such methods, the weighted model and the meta-level model, use different strategies to combine content-based filtering and collaborative filter-ing (Burke, 2002; Li and Kim, 2003). The weighted model uses linear combinations of the prediction results. For example, the method was applied to recommend news in an on-line newspaper (Claypool et al., 1999). The meta-level model employs a sequential combination of collabora-tive and content-based filtering, whereby the output generated by content-based filtering is used as the input for collaborative filtering (Burke, 2002). The user profile of the target user contains user preferences for each prod-uct’s features (i.e., it describes the user’s interests). The sim-ilarity measures of the user profiles and product profiles (features of the products/items) are then derived to predict the target user’s preference ratings on unrated items. This process converts a sparse user-rating matrix into a dense user-rating matrix. Collaborative filtering then uses the dense matrix to provide recommendations. For instance,

Melville et al. (2002)proposed a Content-Boosted Collab-orative Filtering (CBCF) approach for movie recommen-dations, where pseudo user-ratings are derived by combining users’ actual ratings and content-based predic-tions on unrated items. Then, the method applies collabo-rative filtering based on this dense matrix.

RAAP (Delgado et al., 1998) is an example of a hybrid system that can classify and recommend bookmarks retrieved from the Web. A bookmark (document) is classified and stored in a user’s category based on the doc-ument’s content and the user’s profile. A common category schema, rather than a personalized one, is predefined for all users to support the classification. The system uses a hybrid approach to recommend a user’s newly classified book-mark to other users with similar interests. The InLinx sys-tem (Bighini et al., 2003) also supports the classification and recommendation of bookmarks retrieved from the Web based on content analysis and virtual clusters. How-ever, a detailed description of the approach was not pro-vided by the authors. Middleton et al. (2004) presented an ontological user profiling approach to recommend aca-demic papers. This scheme makes recommendations according to the correlations between the users’ current profiles (topics of interest) and papers classified as belong-ing to those topics. Users with similar interests are identi-fied by computing the Pearson correlation between the users’ profiles. Recommended papers are those that match the user’s profile and have been read by similar users. 3. Proposed recommendation methods

This section describes the proposed methods, which combine recommendation techniques with text categoriza-tion techniques to recommend documents to the appropri-ate cappropri-ategories of the target user’s personal folders.

In an organization, documents, manuals and reports from people in the same project team or with similar work experience can be useful when executing a new task. One way to reuse knowledge in an enterprise is to share it by an interflow of knowledge documents. However, this can create a problem for knowledge workers because they have to spend time managing the documents they receive. As mentioned earlier, each knowledge worker may organize his/her folders to manage different types of information in different categories that form a personal folder environ-ment, as shown inFig. 1. Thus, to be effective a knowledge management system must be able to recommend docu-ments stored in other knowledge workers’ folders to the appropriate category of the target worker’s personal folder automatically.

The proposed recommendation methods can be used to proactively notify knowledge workers about peer-reviewed documents and facilitate push-mode knowledge sharing. Two strategies can be used to share knowledge among workers: a pull strategy and a push strategy (Lei et al., 2000; Meso and Smith, 2000). The pull strategy means that workers have to find and retrieve the knowledge they need,

(5)

while the push strategy means that knowledge can be deliv-ered to people proactively by KM systems or KM tech-niques. Knowledge diffusion can be evolved from ‘‘Pull” to ‘‘Push” by applying our proposed recommendation methods. In this way, explicit knowledge embedded in per-sonal folders can be circulated peer-to-peer to facilitate knowledge sharing and diffusion.

We propose three document recommendation methods for personal folder environments, namely, Content-Based Filtering and Categorization (CBFC), Collaborative Filter-ing and Categorization (CFC), and Hybrid FilterFilter-ing and Categorization (HFC). A knowledge worker may create folders with multiple levels to form a hierarchy of catego-ries for classifying and managing his/her documents. In general, documents are stored in the leaf nodes (categories) of the hierarchy. To simplify our research problem, in this paper, a user’s folders are regarded as one level of catego-ries. In the proposed methods, hierarchical folders are translated into one level of categories by taking each node (folder) in the hierarchy as a category. Consequently, a user’s folders with/without a hierarchy are regarded as one level of categories for recommending documents. Instead of using conventional approaches for making a list of documents for recommendation, we construct a list of document–category pairs for recommendation, where a document–category pair (dj, ca) indicates that a document

djis recommended to be placed in the category caof the

tar-get user’s folder. We discuss the process in detail in the fol-lowing subsections.

3.1. Content-based filtering and categorization

Content-based filtering and categorization (CBFC) locates candidates (document–category pairs) for recom-mendation by examining the content of profiles and pre-dicting if they are suitable for recommendation. The method comprises three phases: generating profiles, docu-ment filtering, and generating recommendations. Profile generation prepares three profiles: a Document Profile

(DP), a Category Classifier (CC), and a User Profile (UP), which are used in the document filtering phase to measure the similarity between a document and a category of the target worker. In the last phase, a list of document– category pairs is generated for recommendation to the tar-get worker(s). We now examine the three phases of CBF in depth.

3.1.1. Phase 1: profile preparation

As shown inFig. 2, three kinds of profiles, user profiles, category classifiers, and document profiles, are used to record information about the documents, categories, and users, respectively. A document profile is generated from a specific document, while a category classifier is derived from documents in a specific category. The user profile is evolved from all the documents of interest to the user. In the following, we explain how to generate and denote these profiles.

3.1.1.1. Document profile (DP). A document can be repre-sented as an n-dimensional feature vector of terms and their respective weights, derived from the term frequency and the inverse document frequency (Salton and Buckley, 1988). Let dj be a document, and let document profile

DPj=hdt1,j:dw1,j, dt2,j:dw2,j, . . ., dtn,j:dwn,ji be the feature ... ... . . . . . . . . . . . . . . . . . . . . . ... Knowledge Workers Categories Knowledge Worker A Knowledge Worker B Knowledge Worker X Recommended Documents B B1 B2 B3 A A1 A2 X X1 X2 Knowledge Sharing

Fig. 1. Knowledge sharing in a personal folder environment.

(6)

vector of dj, where dwi,jis the weight of dti,jdenoting a term

i that occurs in dj. Note that the weight of a term represents

its degree of importance in the document. We adopt the tf– idf approach (Eq. (1)) to derive the document profile. Let the term frequency dtfi,j be the occurrence frequency of

term i in dj, and let the document frequency dfi represent

the number of documents containing term i. The impor-tance of a term i to a document dj is proportional to the

term frequency and inversely proportional to the document frequency, expressed as:

dwi;j¼ 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i dtfi;j logdfNiþ 1    2 r dtfi;j log N dfi þ 1   ; ð5Þ where N is the total number of documents and the denom-inator is a normalization factor.

3.1.1.2. Category classifier (CC). A category classifier is constructed by adopting the tf–idf approach (Eq. (1)) to extract the discriminating terms and their weights from the categories of a worker. Let CCr=hcct1,r:ccw1,r, cct2,r:

ccw2,r, . . ., cctn,r:ccwn,ri be the category classifier of category

cr, where ccwi,r is the weight of ccti,r, i.e., a term i that

occurs in cr. In addition, let the term frequency ctfi,r be

the occurrence frequency of term i in cr, and let the

cate-gory frequency cfirepresent the number of categories of a

target user u that contain term i. The weight cwi,rof term

i in a category cr is proportional to the term frequency

and inversely proportional to the category frequency, expressed as in the following equation:

cwi;r¼ 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i ctfi;r logcfLuiþ 1    2 r ctfi;r log Lu cfi þ 1   ; ð6Þ where Luis the total the number of categories of user u. For

hierarchical folders, each node (folder) in the hierarchy is regarded as a category in our methods. All documents stored in a node cr,and the nodes of the sub-trees that have

cr,as the root node are used to generate the category

clas-sifier of cr,.

3.1.1.3. User profile (UP). The user profile UPxof a user ux

is represented as a feature vector with weighted terms derived by analyzing the document set of ux. After the

doc-uments have been pre-processed and represented in the form of term vectors, UPxis derived by averaging the

fea-ture vectors (i.e., using the centroid approach – Eq.(3)) of documents in ux.

3.1.2. Phase 2: document filtering

This phase computes the similarity between a category and a document. Two similarity measures, the similarity between the category classifier and the document profile and the similarity between the user profile and the

docu-ment profile, are used for content-based filtering and cate-gorization. We adopt the cosine formula (Eq. (2)) to compute the similarity measures. There may be cases where the folder does not provide enough information due to poor category construction or insufficient documents. To resolve this problem, we consider the similarity between the document profile and the user profile. The predicted rating, p_a;j, of the recommended document dj (DPj) to

the category ca (CCa) owned by target user ux(UPx) is

expressed as follows: p

_

a;j¼ ð1  aCBFCÞsimðCCa;DPjÞ þ aCBFCsimðUPx;DPjÞ;

ð7Þ where sim(CCa, DPj) is the similarity between the category

classifier CCa and the document profile DPj; and

si-m(UPx, DPj) is the similarity between the user profile UPx

and the document profile DPj. Note that user uxis the

own-er of category ca. The parameter aCBFCis used to determine

the relative influence of the category classifier compared to the user profile. The value of aCBFCranges from 0 to 1 and

is decided by the analytical experiments. 3.1.3. Phase 3: recommendation list generation

In this phase, a list of recommended document–category pairs is generated for allocation to categories in the user’s personal folder. The top-N approach can be used to recom-mend the document–category pairs based on their pre-dicted ratings, i.e., the pairs with the top-N rankings are selected for recommendation. Alternatively, the threshold approach can be used to recommend document–category pairs with a predicted rating higher than a given threshold. Documents that the target user has already stored are not included in the recommendation list. We use the top-N approach to generate a recommendation list in our experiments.

3.2. Two collaborative filtering and categorization approaches

Collaborative filtering and categorization makes recom-mendations based on the opinions of other knowledge workers whose profiles are similar to that of the target user. Two approaches have been developed for this purpose: col-laborative filtering and categorization (CFC), and collabo-rative filtering and categorization based on the joint coefficient (CFC-J). We consider CFC first.

3.2.1. Collaborative filtering and categorization (CFC) CFC consists of four phases, as illustrated in Fig. 3. Phase 1 generates profiles of categories and users, and Phase 2 finds peers with similar interests. The approach considers neighboring (similar) categories to locate suitable document–category pairs. Phase 3 derives the predicted ratings for document–category pairs. In the final phase, the scheme generates a list of document–category pairs for recommendation.

(7)

3.2.1.1. Phase 1: profile preparation. The purpose of this phase is to create profiles of categories and users. To gen-erate the category classifier, CBFC uses the tf–idf approach, which considers the discriminating power of each term to distinguish between categories of a particular user. In other words, the classifier determines which cate-gory a document should be allocated to. However, it is not suitable for deriving the neighbors of categories, since the discriminating terms may distort the similarity of cate-gories used by different workers. Therefore, a category pro-file is constructed to compute the similarity of categories and find their neighbors.

3.2.1.1.1. Category profile (CP). The category profile CPaof category cais defined as the centroid vector obtained

by averaging the feature vectors of documents in ca. Similar

to the generation of user profiles described in Section3.1, category profiles are constructed by the centroid approach (Eq.(3)), which does not consider the effect of terms when determining the category of a user. For hierarchical folders, each node (folder) in the hierarchy is regarded as a category in our methods. All documents stored in a node ca,and the

nodes of the sub-trees that have ca,as the root node are used

to generate the category profile of ca,.

3.2.1.2. Phase 2: identifying k-nearest neighbors. This phase finds the neighbors of the target category based on the similarity of category profiles. To recommend a docu-ment djto the target category ca, the neighboring categories

(neighbors) of caare selected from categories that contain

dj.

The cosine formula in Eq.(2) is used to decide the sim-ilarity of category profiles. There are two ways to choose

neighbors: k-NN-based approaches and threshold-based approaches. The former ranks the similarity measures and chooses the k-nearest neighbors, while the latter chooses neighbors whose similarity measures are above a given threshold. We use the k-NN-based method in this work.

3.2.1.3. Phase 3: document rating and filtering. In addition to the above profiles, a Category-Document Rating (CDR) matrix and a User-Document Rating (UDR) matrix are used to record the ratings of categories and users for documents respectively. The ratings can be derived by a binary approach or a profiling approach. The binary approach derives ratings based on the criterion of whether the category/user folder contains a document. If a category ca contains a document dj, the rating value of ca for dj,

CDRa,j, is 1; otherwise, it is 0. If the category ca is used

by the user ux, i.e., ux has document dj, the rating value

of uxfor dj, UDRx,j, is 1; otherwise, it is 0. The profiling

approach, on the other hand, uses the similarity between the category/user profile and the document profile to derive a rating. The rating value of ca on dj, CDRa,j, is

equal to sim(CPa, DPj), i.e., the similarity of the category

profile of ca and the document profile of dj. The rating

value of ux for dj, UDRu,j, is set to sim(UPx, DPj), i.e.,

the similarity of the user profile of ux and the document

profile of dj. The CDR/UDR generated by the binary

approach is called a binary CDR/UDR, while the CDR/ UDR generated by the profiling approach is called a non-binary CDR/UDR.

Eq.(8)computes the predicted rating for a document dj

recommended to a category caof the target user ux:

^ pa;j¼

P

cb2ca’s neighbor½ð1  aCFCÞsimðCPa;CPbÞ  CDRb;jþ aCFCsimðUPx;UPyÞ  UDRy;j

Number of ca’s neighbors

; ð8Þ

(8)

where sim(UPx, UPy) is the similarity between UPxand UPy;

sim(CPa, CPb) is the similarity between CPa and CPb; cb

belongs to ca’s neighbors; uyis the owner of cb; and aCFC

is a parameter used to adjust the relative importance of the category similarity and the user similarity.

3.2.1.4. Phase 4: recommendation list generation. This phase generates a list of document–category pairs to allocate doc-uments to destination categories by using the top-N approach described in Phase 3 of Section3.1.

3.2.2. Collaborative filtering and categorization based on the joint coefficient (CFC-J)

The difference between CFC and CFC-J is the way the similarity between profiles is computed. CF calculates the similarity by weighted term vectors, whereas CFC-J uses the joint coefficient, which represents the relationship between two categories/users based on the number of the documents they have in common. The more they have, the more similar they are. The joint coefficient (Jcof) in CFC-J is computed as follows:

Jcofðca; cbÞ ¼

2 Na\b

Naþ Nb

; ð9Þ

where Na/Nb is the number of documents in categories

ca/cb, respectively; and Na\b represents the intersection of

documents that ca and cb have in common. The binary

CDR is used to derive Na, Nb, and Na\b.

CFC-J uses the joint coefficient instead of the profile similarity to derive the predicted rating, as expressed in Eq. (10). The joint coefficient between two users, ux and

uy, is defined as Jcof(ux, uy):

3.3. Hybrid filtering and categorization

Hybrid filtering and categorization (HFC) combines content-based filtering and categorization (CBFC) and col-laborative filtering and categorization (CFC) to improve the quality of recommendations. CBFC and CFC can be combined by linear or sequential combination.

3.3.1. Hybrid filtering and categorization based on linear combination (HFCL)

The hybrid filtering and categorization with linear com-bination method (HFCL) is a linear comcom-bination of the

CBFC and CFC results. HFCL derives the predicted rat-ings of document–category pairs by merging the predicted ratings of CBFC and CFC described in Sections 3.1 and 3.2. The predicted rating for recommending a document dj to a category ca is shown in Eq. (11), where ^pCBFCa;j is

the predicted rating derived according to Eq. (7), and ^

pCFC

a;j is the predicted rating derived according to Eq. (8).

The parameter aHFCL is used to represent the relative

importance of CBFC and CFC. HFCL-J linearly combines the predicted ratings of document–category pairs from CBFC and CFC-J by Eq.(12); and ^pCFC-J

a;j is the predicted

rating derived according to Eq.(10): ^

pa;j¼ ð1  aHFCLÞp_CBFCa;j þ aHFCL_CFCpa;j ; ð11Þ

^

pa;j¼ ð1  aHFCL-JÞp_CBFCa;j þ aHFCL-J_CFC-Jpa;j : ð12Þ

3.3.2. Hybrid filtering and categorization with sequential combination (HFCS)

The hybrid filtering and categorization with sequential combination method (HFCS) tries to compensate for the sparsity of rating information in collaborative filtering by using the predicted scores from the content-based mecha-nism as the ratings of unrated items. Thus, the rating func-tion (CDR) in CFC is extended to eCDR derived from CBFC. An extended CDR matrix, eCDR matrix, is gener-ated based on the predicted ratings of unrgener-ated documents derived from CBFC (Eq.(7)). For a category cacontaining

a document dj, i.e., CDRa,j= 1, eCDRa,jis set to 1. For a

category ca that does not contain a document dj, i.e.,

CDRa,j= 0, eCDRa,j is set to 1 if the predicted rating

^

pa;j (derived from Eq. (7)) is greater than a predefined

threshold; otherwise, eCDRa,j= 0. An extended UDR

matrix, eUDR matrix, is generated as follows. If there exists a category ca and ux owns ca such that eCDRa,j

equals 1, then eUDRx,j= 1; otherwise, eUDRx,j= 0.

Moreover, the profiling approach described in Phase 3 of Section3.2.1 can be used to derive non-binary ratings by using the similarity measures of the category/user profile and the document profile. The category/user profile of each category/user is re-generated according to the binary eCDR/eUDR matrix. The similarity measures derived based on the new category/user profile are used for the non-binary ratings.

In the HFCS method, the predicted ratings are derived as follows:

^ pa;j¼

P

cb2ca’s neighbor½ð1  aCFC-JÞJcofðca; cbÞ  CDRb;jþ aCFC-JJcofðux; uyÞ  UDRy;j

Number of ca’s neighbors

: ð10Þ

^ pa;j¼

P

cb2ca’s neighbor½ð1  aHFCSÞsimðca; cbÞ  eCDRb;jþ aHFCSsimðux; uyÞ  eUDRy;j

Number of ca’s neighbors

(9)

HFCS-J combines CBFC and CFC-J by the sequential approach. The joint coefficient in CFC-J is based on the number of common documents required to compute the similarity measure. HFCS-J uses extended CDR and UDR to derive the predictions, as shown in Eq.(14). Bin-ary eCDR and eUDR are used to compute the Jcof(ca, cb)

and Jcof(ux, uy), respectively. The eCDR/eUDR is

gener-ated according to the same approach described in HFCS:

4. Experiments and evaluations

We applied the CBFC, CFC, and hybrid methods to rec-ommend relevant academic papers to the researchers in a research institute. In this section, we describe the experi-ment design, evaluation metrics, and experiexperi-ment results. 4.1. Experiment setup

Since the experiments were conducted in a real applica-tion domain, namely, a research institute laboratory, there were few participants; hence, the size of the dataset was small. Knowledge workers have their own folders to store documents (research papers) that assist them in writing the-ses or accomplishing research projects. There are 11 users, 35 categories and 1062 documents. The sparsity in the data sets is 99.962% (749 non-zero entries in 506 35 matrixes). Personal folders are translated into one level of categories, as described in Section3. The data set is divided as follows: 80% for training and 20% for testing. The training set includes documents stored in workers’ personal folders, and is used to generate a recommendation list. Test data is used to verify the recommendation quality of the various methods.

Two metrics, precision and recall, are commonly used to measure the quality of recommendations. These metrics are also used extensively in information retrieval (Salton and McGill, 1983; Van Rijsbergen, 1979). Recall is the ratio of relevant documents that can be located, as shown in the following equation:

Recall¼number of correctly recommended documents

number of relevant documents :

ð15Þ Precision is the ratio of recommended documents (pre-dicted to be relevant) that are actually relevant to workers, as shown in the following equation:

Precision¼number of correctly recommended documents number of recommended documents :

ð16Þ Documents relevant to a target user u are the documents owned by u in the test set. Each relevant document is

asso-ciated with its corresponding category owned by u. This is called a relevant document–category pair of u. Correctly recommended documents are those in the recommended document–category pairs that match the relevant docu-ment–category pairs of u.

Increasing the number of recommended documents tends to reduce the precision and increase the recall. The F1-metric is used to achieve a trade-off between precision

and recall (Van Rijsbergen, 1979) by assigning equal weights to them as follows:

F1¼2 Recall  Precision

Recallþ Precision : ð17Þ

Each metric is computed for each researcher. Then, the average value computed for all researchers is taken as the measure of the recommendation quality.

4.1.1. Parameter selection

We conduct pilot experiments to determine the parame-ter values of various methods (equations). In the experi-ments, we systematically adjust the values of the parameters in increments of 0.1. The F1 metric (given in Eq.(17)) is chosen as the performance measure to evaluate the effectiveness of the methods. The optimal parameter values with the best results (the highest average F1 values computed over various top-N) are chosen as the parameter settings of the proposed equations.

4.2. Experiment results

We perform experiments based on the CBFC, CFC, and hybrid methods, including HFCL and HFCS. The F1 met-ric is used to compare the recommendation quality of the methods for various values of a and top-N recommenda-tions. The top-N approach recommends N document–cate-gory pairs with N highest rankings of the predicted ratings. 4.2.1. Experiment one: comparison of CBFC and CBFC-CP methods

To evaluate the effectiveness of CBFC, we compare it with CBFC-CP. The CBFC approach (Eq.(7)) derives rec-ommendations via the category classifier (CC), which uses tf–idf to distinguish between categories, whereas CBFC-CP uses the category profile (CP), which is derived by the cen-troid approach. Eq.(7)is also used to derive the CBFC-CP method by replacing CC with CP and parameter aCBFC

with aCBFC-CP. The parameter aCBFC is used to tune the

weight of predicted ratings produced by the category clas-sifier and the user profile. We tune aCBFCto between 0 and

1 by systematically adjusting the value of aCBFC in

incre-ments of 0.1 and examine its effect on the F1 metrics. ^

pa;j¼

P

cb2ca’s neighbor½ð1  aHFCS-JÞJcofðCa; CbÞ  eCDRb;jþ aHFCS-JJcofðUx; UyÞ  eUDRy;j

Number of ca’s neighbors

(10)

The value of aCBFCis determined according to the highest

average F1 value computed over various top-N. The other parameters in the following experiments are decided simi-larly. The highest average F1 value of CBFC is achieved when aCBFC= 0, while the highest average F1 value of

CBFC-CP is achieved when aCBFC-CP= 0.1. Fig. 4 shows

the F1 values of CBFC and CBFC-CP for various top-N recommendations by setting aCBFCof CBFC and aCBFC-CP

of CBFC-CP to 0 and 0.1, respectively. The setting aCBFC= 0 indicates that the category classifier is powerful

enough to determine the correct categories for documents. The results show that, in general, CBFC outperforms CBFC-CP. The category classifier provides better quality recommendations than the category profile because it can distinguish between categories.

4.2.2. Experiment two: comparison of Binary, CFC-Profile and CFC-J methods

This experiment compares different methods of CFC: CFC-Binary, CFC-Profile, and CFC-J. CFC-Binary/ CFC-Profile use binary/profiling ratings respectively, as described in Section3.2.1, while CFC-J uses the joint coef-ficient approach described in Section3.2.2. The parameter ais used to tune the weights of the ratings of the category similarity and the user similarity. Based on the highest average F1 values computed over various top-N, the a val-ues for CFC-Binary, CFC-Profile, and CFC-J, are 0.5, 0.0, and 0.2, respectively. This indicates that the ratings for the similarity of user profiles improve the recommendation quality.

Fig. 5 compares CFC-Binary, CFC-Profile, and CFC-J under different top-N by setting aCFC-binaryto 0.5, aCFC-Profile

to 0, and aCFC-Jto 0.2. Binary outperforms the CFC-Profile, which indicates that the rating function of the latter cannot provide useful rating information. This may be due to the fact that the similarity rating between a category and a document does not reflect the user’s document ratings accurately. Consequently, we adopt the CFC-Binary method rather than the CFC-Profile method to represent the CFC method in further comparisons and implementa-tions of the hybrid approach. The results also show that CFC-J performs better when top-N is smaller, while

CFC-Binary works better when top-N is larger. Since the number of overlapping documents among different catego-ries is usually small, CFC-J’s performance deteriorates as the number of recommended documents increases. 4.2.3. Experiment three: comparison of linear hybrid methods

This experiment compares two hybrid methods with lin-ear combination, HFCL and HFCL-J. The parameters aHFCL and aHFCL-J are used to adjust the contribution of

the predicted ratings from CBFC and CFC/CFC-J, respec-tively. Based on the highest average F1 values, these parameters are set to 0.4 and 0.6, respectively.Fig. 6 com-pares HFCL and HFCL-J under different top-N, by setting aHFCL to 0.4 and aHFCL-Jto 0.6. The HFCL method

per-forms better than the HFCL-J method.

4.2.4. Experiment four: comparison of sequential hybrid methods

This experiment compares two sequential hybrid meth-ods, HFCS and HFCS-J. Based on the highest average F1 values, the parameters for HFS and HFS-J are set to 0.2 and 0.0, respectively. Fig. 7 compares HFCS and HFCS-J under different top-N by setting aHFCS to 0.2

and aHFCS-Jto 0. HFCS performs better than HFCS-J.

4.3. Comparing all methods

Fig. 8 compares all the methods under different top-N. The results show that CFC (CFC-Binary) and CFC-J

out-CFC-Binary/CFC-Profile/CFC-J 0 0.05 0.1 0.15 0.2 2 10 20 30 40 all Top-N F1 CFC-Binary CFC-Profile CFC-J 6

Fig. 5. Comparison of CFC-Binary, CFC-Profile, and CFC-J for various top-N recommendations. 0 0.1 0.05 0.15 0.2 0.25 Top-N F1 HFCL HFCL/HFCL-J HFCL-J 2 6 10 20 30 40 all

Fig. 6. Comparison of HFCL and HFCL-J. CBFC v.s. CBFC-CP 0 0.05 0.1 0.15 0.2 2 10 20 30 40 all Top-N F1 CBFC CBFC-CP 6

Fig. 4. Comparison of CBFC and CBFC-CP for various top-N recommendations.

(11)

perform CBFC. CFC-J performs better when top-N is smal-ler, but CFC’s performance is better when top-N is larger. The linear and sequential hybrid methods, HFCL, HFCL-J, and HFCS achieve relatively satisfactory results because they combine the advantages of CBFC and CFC. In gen-eral, hybrid approaches perform better than pure content-based or collaborative filtering and categorization. Both HFCL and HFCL-J outperform all the other approaches. Although HFCS outperforms CFC, HFCS-J does not out-perform CFC-J. In fact, HFCS-J out-performs even worse than the CBFC method. The sequential hybrid approach does not perform as well as expected. This may be due to the poor construction of the extensible matrix, which is derived from the predicted ratings of the CBFC method.

5. Conclusion and future work

In this paper, we have investigated the issue of recom-mending documents to appropriate categories in personal folder environments where knowledge workers use their own folders (categories) to organize and store documents. We propose document recommendation methods that

facilitate the recommendation and sharing of explicit cod-ified knowledge within a personal folder environment. Rec-ommendations made to such environments need to consider the appropriate category for a recommended doc-ument. Most conventional recommendation methods focus on recommending items to users without addressing the issue of recommending items to the target user’s appropri-ate document cappropri-ategory. Some methods have addressed the issue by assuming a common category schema without con-sidering personalized categories, or by making recommen-dations to users first and then determining the categories of the recommended documents. The proposed methods com-bine recommendation and text categorization techniques to recommend documents to a knowledge worker’s personal-ized categories.

Several existing recommendation methods are adopted and modified by integrating them with text categorization techniques to design the following document recommenda-tion methods: content-based filtering and categorizarecommenda-tion (CBFC), collaborative filtering and categorization (CF) and hybrid filtering and categorization (HFC) methods. Experiments were conducted to evaluate and compare the performance of these methods using data collected from a research institute laboratory. The experiment results demonstrate that CBFC outperforms CBFC-CP, while CFC-J achieves the best performance among the CFC methods when top-N is smaller. Moreover, HFCL outper-forms HFCL-J, and HFCS peroutper-forms better than HFCS-J. Among the hybrid methods, HFCL achieves the best rec-ommendation quality. The hybrid methods, including HFCL and HFCL-J, outperform the pure content-based methods as well as the collaborative filtering and categori-zation methods.

The proposed recommendation methods can be used to proactively notify knowledge workers about relevant docu-ments from peers and to facilitate push-mode knowledge sharing. Consequently, workers can learn from one another and thereby reduce the effort and manpower involved in searching for documents needed to improve productivity and efficiency when performing knowledge-intensive tasks.

In our future work, we will conduct experiments on a larger data set, i.e., more documents, categories, and users. Currently, the lack of rating information means that rat-ings in the collaborative filtering method must be presented in binary form. The collection of ratings from users should improve the performance of the collaborative filtering and hybrid methods. The adoption of different classifiers, such as probabilistic models, to determine a user’s information needs precisely and route relevant documents to the right folders will also be addressed in our future work. More-over, document semantics considering the implied meaning of co-occurred keywords in documents will be helpful to facilitate knowledge sharing and document understanding (Zhuge and Luo, 2006). We will adopt document semantics to further improve the recommendation quality in future work. Furthermore, our proposed methods do not utilize

HFCS/HFCS-J 0 0.05 0.1 0.15 0.2 2 10 20 30 40 all Top-N F1 HFCS HFCS-J 6

Fig. 7. Comparison of HFCS and HFCS-J for various top-N recommendations. 0 0.05 0.1 0.15 0.2 0.25 2 10 20 30 40 all Top-N F1 CBFC CFC CFC-J HFCL HFCL-J HFCS HFCS-J 6

(12)

the hierarchical relationships between categories to recom-mend and allocate documents to folders. In the category hierarchy, the lower level of a category contains documents on more specific subjects, while the upper level contains documents on more general subjects covered by the cate-gory. Thus, when recommending documents to the appro-priate level of a category, the system needs to consider the subject covered by the category. For example, if the recom-mendation scores of a document for two sibling nodes are both high, it may be more appropriate to allocate the doc-ument to their parent node, since allocating the docdoc-ument to either one of the sibling nodes would not really reflect the subject matter of the document. In our future work, we will extend our scheme by considering the hierarchical relationships and the subjects covered by categories to improve the quality of recommendations.

Acknowledgement

This research was supported in part by the National Sci-ence Council of the Taiwan (Republic of China) under the Grant NSC 95-2416-H-009-002.

References

Agostini, A., Albolino, S., Boselli, R., De Michelis, G., De Paoli, F., Dondi, R., 2003. Stimulating knowledge discovery and sharing. In: Proceedings of the International ACM SIGGROUP Conference on Supporting Group Work, pp. 248–257.

Baeza-Yates, R., Ribeiro-Neto, B., 1999. Modern Information Retrieval. Addison-Wesley, Boston, MA.

Bighini, C., Carbonaro, A., Casadei, G., 2003. InLinx for document classification, sharing and recommendation. In: Proceedings of the Third IEEE International Conference on Advanced Learning Tech-nologies, pp. 91–95.

Breese, J.S., Heckerman, D., Kadie, C., 1998. Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence, pp. 43–52.

Burke, R., 2002. Hybrid recommender systems: survey and experiments. User Modeling and User-Adapted Interaction 12 (4), 331–370. Chen, P.-M., Kuo, F.-C., 2000. An information retrieval system based on

a user profile. The Journal of Systems and Software 54 (1), 3–8. Claypool, M., Gokhale, A., Miranda, T., Murnikov, P., Netes, D., Sartin,

M., 1999. Combining content-based and collaborative filters in an online newspaper. In: Proceedings of the ACM SIGIR Workshop on Recommender Systems: Algorithms and Evaluation.

Davenport, T.H., Prusak, L., 1998. Working Knowledge: How Organi-zations Manage What They Know. Harvard Business School Press, Boston, MA.

Delgado, J., Ishii, N., Ura, T., 1998. Intelligent collaborative information retrieval. Lecture Notes in Computer Science 1484, 170–182. Glance, N., Arregui, D., Dardenne, M., 1998. Knowledge pump:

community-centered collaborative filtering. In: Proceedings of the Fifth DELOS Workshop on Filtering and Collaborative Filtering, pp. 83–88.

Kamba, T., Bharat, K., Albers, M.C., 1995. The Krakatoa Chronicle: an interactive personalized newspaper on the web. In: Proceedings of the Fourth International World Wide Web Conference, pp. 159–170.

Konstan, J.A., Miller, B.N., Maltz, D., Herlocker, J.L., Gordon, L.R., Riedl, J., 1997. GroupLens: applying collaborative filtering to usenet news. Communications of the ACM 40 (3), 77–87.

Lang, K., 1995. NewsWeeder: learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339. Langari, Z., Tompa, F.W., 2001. Subject classification in the oxford English dictionary. In: Proceedings of the IEEE International Con-ference on Data Mining (ICDM), pp. 329–336.

Larkey, L.S., Croft, W.B., 1996. Combining classifiers in text categoriza-tion. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 289–297.

Lei, Z., Shouju, R., Xiaodan, J., Zuzhao, L., 2000. Knowledge manage-ment and its application model in enterprise information systems. IEEE International Symposium on Technology and Society, 287–292. Lewis, D., Ringuette, M., 1994. A comparison of two learning algorithms for text categorization. In: Proceedings of the Third Annual Sympo-sium on Document Analysis and Information Retrieval (SDAIR’94), pp. 81–93.

Li, Q., Kim, B.M., 2003. An approach for combining content-based and collaborative filters. In: Proceedings of the Sixth International Work-shop on Information Retrieval with Asian Languages, pp. 17–24. Liu, D.-R., Shih, Y.-Y., 2005. Hybrid approaches to product

recommen-dation based on customer lifetime value and purchase preferences. The Journal of Systems & Software 77 (2), 181–191.

Melville, P., Mooney, R.J., Nagarajan, R., 2002. Content-boosted collaborative filtering for improved recommendations. In: Proceedings of the 18th National Conference on Artificial Intelligence, pp. 187–192. Meso, P., Smith, R., 2000. A resource-based view of organizational knowledge management systems. Journal of Knowledge Management 4 (3), 224–234.

Middleton, S.E., Shadbolt, N.R., De Roure, D.C., 2004. Ontological user profiling in recommender systems. ACM Transactions on Information Systems 22 (1), 54–88.

Mooney, R.J., Roy, L., 2000. Content-based book recommending using learning for text categorization. In: Proceedings of the Fifth ACM International Conference on Digital Libraries, pp. 195–204.

Nonaka, I., 1994. A dynamic theory of organizational knowledge creation. Organization Science 5 (1), 14–37.

Rucker, J., Polanco, M.J., 1997. Siteseer: personalized navigation for the web. Communications of the ACM 40 (3), 73–76.

Salton, G., Buckley, C., 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5), 513–523. Salton, G., McGill, M., 1983. Introduction to Modern Information

Retrieval. McGraw-Hill, New York.

Shapira, B., Shoval, P., Hanani, U., 1999. Experimentation with an information filtering system that combines cognitive and sociological filtering integrated with user stereotypes. Decision Support Systems 27 (1/2), 5–24.

Shardanand, U., Maes, P., 1995. Social information filtering: algorithms for automating ‘‘Word of Mouth”. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI’95), Denver, Colorado, United States, pp. 210–217.

Van Rijsbergen, C.J., 1979. Information Retrieval, second ed. Butter-worth, London.

Woodruff, A., Gossweiler, R., Pitkow, J., Chi, E.H., Card, S.K., 2000. Enhancing a digital book with a reading recommender. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 153–160.

Zack, M.H., 1999. Managing codified knowledge. Sloan Management Review 40 (4), 45–58.

Zhuge, H., Luo, X., 2006. Automatic generation of document semantics for the e-science Knowledge Grid. The Journal of Systems & Software 79 (7), 969–983.

數據

Fig. 1. Knowledge sharing in a personal folder environment.
Fig. 3. The collaborative filtering process.
Fig. 8 compares all the methods under different top-N. The results show that CFC (CFC-Binary) and CFC-J
Fig. 8. Comparison of all methods.

參考文獻

相關文件

ADtek assumes no responsibility for any inaccuracies that may be contained in this document, and make no commitment to update or to keep current the information contained in

ADtek assumes no responsibility for any inaccuracies that may be contained in this document, and make no commitment to update or to keep current the information contained in

In this paper, we propose a practical numerical method based on the LSM and the truncated SVD to reconstruct the support of the inhomogeneity in the acoustic equation with

Teachers may consider the school’s aims and conditions or even the language environment to select the most appropriate approach according to students’ need and ability; or develop

We have made a survey for the properties of SOC complementarity functions and theoretical results of related solution methods, including the merit function methods, the

We have made a survey for the properties of SOC complementarity functions and the- oretical results of related solution methods, including the merit function methods, the

Relevant topics include, but are not limited to: Document Representation and Content Analysis (e.g., text representation, document structure, linguistic analysis, non-English

Unless prior permission in writing is given by the Commissioner of Police, you may not use the materials other than for your personal learning and in the course of your official