• 沒有找到結果。

(F ORM 3) S TATISTICS ON R ESEARCH O UTCOME OF THIS P ROGRAM

LISTING TOTAL DOMESTIC INTERNATIONAL SIGNIFICANT1 CITATIONS2 TECHNOLOGY TRANSFER

JOURNALS 1

CONFERENCES 1

PUBLISHED ARTICLES

TECHNOLOGY REPORTS 2

PENDING -

PATENTS

GRANTED -

COPYRIGHTED INVENTIONS ITEM

ITEM

WORKSHOPS/CONFERENCES3

PARTICIPANTS HOURS

TRAINING COURSES

(WORKSHOPS/CONFERENCESPARTICIPANTS

HONORS/AWARDS4 KEYNOTES GIVEN BY PIS

PERSONAL ACHIEVEMENTS

EDITOR FOR JOURNALS ITEM LICENSING FEE

TECHNOLOGY TRANSFERS

ROYALTY

INDUSTRY STANDARDS5 ITEM

ITEM - - -

TECHNOLOGICAL SERVICES6

SERVICE FEE - - -

1 Indicate the number of items that are significant. The criterion for “significant” is defined by the PIs of the program. For example, it may refer to Top journals (i.e., those with impact factors in the upper 15%) in the area of research, or conferences that are very selective in accepting submitted papers (i.e., at an acceptance rate no greater than 30%). Please specify the criteria in Appendix IV.

2 Indicate the number of citations. The criterion for “citations” refers to citations by other research teams, i.e., exclude self-citations.

3 Refers to the workshop and conferences hosted by the program.

4 Includes Laureate of Nobel Prize, Member of Academia Sinica or equivalent, fellow of major international academic societies, etc.

5 Refers to industry standards approved by national or international standardization parties that are proposed by PIs of the program.

6 Refers to research outcomes used to provide technological services, including research and educational programs, to other ministries of the government or professional societies.

V.(FORM4) EXECUTIVE SUMMARY ON RESEARCH OUTCOMES OF THIS PROGRAM

Abstract in English

Personalization and customization have been shown to be an indispensable function in today’s e-commerce businesses and highly applauded by their customers. Due to the emergence of the concept and practice of Web 2.0, the construction of complex social networks becomes possible, and many researches have been devoted to turning social network into actionable knowledge. In this first two years of the sub-project, we studied the various approaches that incorporate social network of scholars and/or article content in a literature digital library for making recommendation that meet users’ short term interests. We applied the proposed approaches to a set of articles collected from prestigious data mining conference proceedings and journals. We show that the social network-based approach performs better than content-based approach in scenarios where selected articles exhibit a high degree of content similarity and has higher chance to avoid recommending articles of low quality. We subsequently developed three hybrid approaches, namely switching, proportional, and fusion, that combine content-based and social network-based approaches. Our experimental results show that the hybrid approaches generally achieve performance higher than or comparable to the best of the primitive methods (that use only a single source of knowledge) under various scenarios. In addition, the hybrid approach is effective in giving lower rank to the articles of low quality. A paper that describes the pure social network-based recommendation research work has been submitted to Information Processing and Management, and another paper that reports the hybrid recommendation research results will be submitted to Online Information Review shortly.

Abstract in Chinese

個人化和客製化的服務已是今日電子商務裡不可或缺的一環,事實上,電子商店所提供個人化服務的深度和廣

度常是實體商店所望塵莫及,也因此個人化服務成為區分實體商務和電子商務的一個重要指標。近幾年來Web

2.0 的概念和實作大為風行,造就了不少新興的網站如 Youtube、Facebook,和 Google 群組,即使一般電子商 務網站也紛紛加入顧客彼此互動的功能,凡此種種皆使得早年極難獲得的社會網絡資料變得相對容易許多,也 吸引了許多研究者研究利用社會網絡來提供各種有用服務的方法,最終的目的常是為了進一步提高商業利益。

在本子計畫的第一年,我們提出一些利用社會網絡資料來進行推薦的方法,這些方法被應用在文獻資料庫的推 薦,我們利用文章作者來建置學者的社會網絡,然後當讀者選擇數篇他感興趣的文章後,我們便利用這些文章 作者的社會網絡來推薦另一些文章,以滿足讀者即時的資訊需求。我們實作了這些方法,並收集資料探勘領域

的知名會議論文和期刊論文資料共約6000 篇來評估這些方法。實驗結果發現當讀者所選擇的論文呈現高度內容

相似度時,相較於傳統以文章內容為基礎的推薦,以社會網絡為基礎的推薦可以得到比較好的推薦結果。反之,

當讀者所選擇的論文呈現高度內容分歧時,以文章內容為基礎的推薦可以獲得較佳的推薦結果,也就是說以文 章內容為基礎的推薦和以社會網絡為基礎的推薦各擅所長。因此,在本計劃的第二年,我們接著發展出三個結 合社會網絡和文章內容資訊的推薦方法,實驗結果顯示這些混合的方法在大部分的環境裡都比僅利用單一資訊 來源的推薦方法獲得更好或類似的推薦結果。研究的結果已經寫成兩篇論文,已經或即將投稿到相關的國際期 刊。

Keywords

社會網路(social network),推薦系統(recommender system),社會網絡為基礎的推薦(social network-based recommendation),混合推薦(hybrid recommendation),文獻資料庫(literature digital library)

1. Introduction

Recent advances in networking and Web technologies have made possible the availability and accessibility of many genres of information, including the various types of audio, video, and textual data. It was estimated that by 2010, the rate of digital data generated worldwide will be close to one zettabytes per year (Gantz et al. 2007). Facing the enormous amount of data, many people find it difficult to identify a handful of information relevant to their information need. The traditional information searching mechanisms that require the specification of keywords are ineffective and inefficient. In the past few years, many Web sites have provided recommendation functions that intended to offer personal recommendations for various types of products and services. As the effectiveness of customization and personalization has been highly applauded, recommender systems have become the indispensable service of many online stores and Web sites. Notable examples include recommending books, CDs, and other products at Amazon.com (Linden et al. 2003) and Epinions.com (Massa & Bhattacharjee 2004), and movies by MovieLens (Miller et al. 2003) and FilmTrust (Golbeck & Hendler 2006).

Recommendation techniques have been extensively researched in the past decade, e.g., see (Adomavicius & Tuzhilin 2005) for a comprehensive coverage. Based on the types of data and techniques used to arrive at recommendation decisions, recommendation systems can broadly be classified into the following approaches—popularity-based, content-based, collaborative filtering, association-based, demographics-based and reputation-based (Wei et al. 2002).

However, most of these recommendation techniques are not suitable for recommending literatures in digital libraries because they rely on either the explicit specification of users’ interests or the implicit derivation from users’ past browsing behavior or transactions. Such techniques require the identification of each user, which is usually not feasible in literature digital library because most literature digital libraries do not require users to identify themselves. Under such a circumstance, the task-focused approach is deemed more appropriate for the recommendation in literature digital libraries (Hwang, Hsiung & Yang 2003). The task-focused approach recommends articles that resemble the articles the user has selected. Figure 1 shows a snapshot of article selection in Elsevier SDOS. The set of selected articles is called a session in our subsequent discussion.

Figure 1: A snapshot of article selection in Elsevier SDOS literature digital library

The task-focused approach computes the similarities between articles and the session of a user and subsequently recommend articles of high similarities to the user. Previous work mainly use content or usage log in defining article similarities (Hwang & Chung 2004). With the emergence of P2P and Web 2.0, a precious source—human relationships—is becoming increasingly available. We reports our endeavors of incorporating human relationships into the recommendation techniques in literature digital libraries.

Human relationships are embodied by social networks, which are often visualized as a graph. The social networks have been adopted in marketing for a long time since people heavily relied on “word of mouth” from friends and colleagues in decision making. In recent years, many researches that utilize social networks to discover actionable knowledge have been reported. These applications include building referral chains to find suitable people with desired expertise (Kautz 1997), locating customers with the highest network values for marketing (Domingos & Richardson 2001), and predicting whether people will collaborate in the near future (Liben-Nowell & Kleinberg 2003).

There are many ways to form a social network. For example, using email records (Kautz, Selman & Milewaki , 1995), analyzing responses and citations in newsgroups (Chang, Chen & Chung 2002, Sack 2000), being acquainted by friends (Friend of a Friend Project, Matsuo et al. 2004), coauthoring the same articles (Newman 2001), linking to friends’

homepages (Adamic & Adar 2003), participating the same projects (Matsuo et al. 2004), and attending the same events (Counts & Geraci 2005). In academic environments, coauthoring relationships between scholars are perhaps one of the most important types of connections. Thus, many researches have been devoted to the analysis of the coauthor-based social networks and the identification of some useful knowledge (Newman 2001, Oh et al. 2005, Yoshikane & Kageura 2004). Moreover, Lam (2004) proposed a hybrid approach that integrates social networks into traditionally collaborative filtering systems, and his experiment showed that collaborative filtering systems with social network elements outperform the traditional ones.

In this report, we propose to incorporate social networks into the task-focused approach for literature recommendations in a literature digital library. We then compare these new methods with the traditional content-based approach and find that there is not clear winner. While social network-based approach prevails when a session contains article of similar content, the content-based approach achieve better performance otherwise. We hence propose to integrate these two kinds of approaches and utilize both content and social network features in an appropriate way. Experiments using articles collected from prestigious conference proceedings and journals demonstrate that the hybrid approach achieves the best performance under most operating regions.

This report is structured as follows. Related work is described in Section 2. Section 3 discusses how to construct the scholar social network using the coauthoring relations. Various recommendation methods for utilizing the social network are described in Section 4. Section 5 present several hybrid methods that utilize both content and social networks. Evaluation results are presented and discussed in Section 6. Finally, Section 7 summarizes this paper and points out future research directions.

2. Related work

We first review the current techniques in recommender systems. Then we give an introduction on the social network analysis, which lays the foundation for social networks, and discuss the properties of a social network and the applications of social networks. Finally, we introduce some recommender systems that utilize social networks to make recommendations.

2.1 Recommender Systems

Recommender systems typically suggest items (information, products or services) to the users based on their interest profiles derived from customer demographics, features of interested items, and/or user preferences (e.g., ratings or purchasing history) (Wei et al. 2002). Users’ interest profiles could be generated using explicit or implicit relevance feedback. Explicit relevance feedback asks the users to explicitly indicate their preferences on some items. On the contrary, implicit relevance feedback is to infer users’ interests by observing their actions. Interest profile of a user then facilitates the estimation of the ratings of items unseen by the user, and items of high estimated ratings are subsequently recommended.. There have been many techniques proposed for rating estimation. The most widely used techniques are Content-based recommendation, Collaborative filtering and Hybrid approaches (Balabanovic & Shoham 1997):

Content-based Recommendation

The content-based recommendation establishes a user’s interest profile by analyzing the content features of his preferred items and represents user’s interest profile as a vector with each element indicating the user’s preference on a selected term. Though there have been several ways proposed to determine the importance of a selected term in the content of an item, the most widely used measure today is TF-IDF (Salton & McGill 1983). The content of an item diis define as Content(di) = (wi,1, wi,2, …, wi,n), where wi,j is the TF-IDF weight of the jth keyword in the item di.

The content-based system recommends the items which are similar to those items the user liked in the past. Let ContentBasedProfile(c) denote the taste of user c in the past, represented as a vector of weights (wc,1, wc,2, …, wc,n), which can be computed from individually rated content vectors using a variety of techniques. One widely used technique is to compute ContentBasedProfile(c) as the weighted sum of the content vectors the user has rated (Balabanovic & Shoham 1997). A content-based function may then use some similarity measure to calculate the closeness between an item and the interest profile. The following shows the closeness between an item, represented as

w

s , and the interest profile, represented as

w

c, using cosine similarity function:

( )

where n is the total number of keywords. The items with top scores will be recommended.

Collaborative Filtering

Collaborative filtering systems try to predict the utility of an item for a particular user based on the rating of the item previously given by other users. More formally, the utility u ,

( )

c s of an item s for user c is estimated based on the

utilities

u ( ) c

j,

s

assigned to item s by those users

c

j

C

who are “similar” to user c. For example, in a movie recommendation application, in order to recommend movies to user c, the collaborative recommender system tries to find the “peers” of user c, i.e., other users that have similar tastes in movies. Then, only the movies that are liked by the

“peers” of user c would be recommended.

According to Breese et al. (1998), algorithms for collaborative recommendations can be classified into two general classes: memory-based and model-based. Memory-based algorithms essentially are heuristics that make rating

predictions based on the entire collection of previously rated items by all users. That is, the value of the unknown rating rc,s for user c and item s is usually computed as an aggregate of the ratings of some other users for the same item s:

s

where Cˆ denotes the set of N users that are the most similar to user c and who have rated item s. In the simplest case, the aggregation can be a simple average. However, the most common aggregation approach is to use the weighted sum:

∑ ( )

similarity measures are Pearson correlation and cosine function.

In contrast to memory-based methods, model-based algorithms use the collection of ratings to learn a model, which is then used to make rating predictions. For example, Breese et al. (1998) proposed a probabilistic approach to collaborative filtering, and the predicted rating of an item s to a user c is

( ) ( )

where the range of rating values are integers between 0 and n. Two probabilistic models, namely cluster models and Bayesian networks, were proposed.

Hybrid Approaches

Several efforts have been attempted to combine content-based and collaborative approaches for avoiding their respective limitations (Balabanovic & Shoham 1997, Pazzani 1999, Soboroff & Nicholas 1999, Torres et al. 2004).

Burke (2002) classified hybrid approaches into the following categories:

‹ Weighted – The score of a recommended item is the weighted sum of the scores computed using content-based and collaborative approaches.

‹ Switching –Based on some criterion, the score of a recommended item is computed by either content-based or collaborative approach.

‹ Mixed – The recommendation list is formed by combining top items of content-based and collaborative approaches.

‹ Feature Combination – Collaborative information are treated as additional features associated with each example, and content-based approach is employed over this augmented data set.

‹ Cascade – One recommendation technique is employed first to produce a coarse ranking of candidates and a second technique refines the recommendation from among the candidate set.

Task-Focused Approach

Many research efforts described above were devoted to the acquisition of users’ long-term interests. In contrast, users also have short-term interests, which refer to the immediate information need for the task at hand. Short-term interests may or may not relate to long-term interests, and thus it is inappropriate to derive a user’s task profile from her previous ratings or historical data. Instead, a task profile is dynamically specified by a list of example documents that are related

to the task. When a user chooses to browse a document A, those documents that are either similar to A in their content or often accessed together with A by other users are recommended. Such a function has already been provided by many digital libraries (e.g., Googlescholar and the ACM Digital Library). The task profile of a user can be extended to include a set S of documents that the user recently accessed, and the goal becomes to recommend a set of documents whose contents are similar to and/or that are often accessed together with the documents in S. This approach has been widely applied to the recommendation of Web pages (Srivastava et al. 2000). Typical approaches for recommending Web pages involve making use of Web content or Web usage logs (Yan et al. 1996, Mobasher et al. 2000, Yang et al. 2001).

These approaches were extended by Hwang et al. (2003, 2004) to literature recommendation of digital libraries.

2.2 Social Network Applications

Social network was originated from sociology, and has been the major subject to be researched in recent years (Staab et al. 2005). Social networks display people and relations in nodes and edges respectively. Wasserman & Faust (1994) classified the relations into eight sorts as below:

• Kinship: brother of, father of

• Social Roles: boss of, teacher of, friend of

• Affective: likes, respects, hates

• Cognitive: knows, views as similar

• Actions: talks to, has lunch with, attacks

• Flows: number of cars moving between

• Distance: number of miles between

• Co-occurrence: is in the same club as, has the same color hair as

Relations could be directional or undirectional. For example, roommate relationship is undirectional, whereas thesis

Relations could be directional or undirectional. For example, roommate relationship is undirectional, whereas thesis

相關文件