• 沒有找到結果。

interaction

feedback

Figure 2.4: The structure of PRES.

contains information a user is interested in. What’s more, negative examples are not used since it is hard to recognize the negative actions. For example, the reason for the short reading time for a page might be just the document is short or information provided less new to the user. The links the user does not click do not provide strong evidence as well. The reason is that the user might visit the document later or just does not notice the link. Another important impact is that the user is interested in a topic for a short period since once enough information has been provided the user will lose interest in that topic. In short, the user model has to be dynamic and learned from positive examples only. PRES uses relevance feedback to learn user profile model. After the user reads document D, a user profile P would be updated as P = αP + βD.

Though content-based can somehow deal with the cold start problem of CF with the content of items, several issues emerge. The major issue is that users are interested in manifold topics even those topics do not much intersect in content.

Therefore, pure content-based filtering lower the exploration of potential items of which are not much related to those have been highly ranked by active user.

2.3 Hybrid Approaches Combined with CF and Content-based Filtering

In addition to Cf and content-based filtering, approaches combined with these two have been studied and reveal benefit [3]. CF has been mentioned about the chal-lenges of cold start and content-based filtering alone can prove ineffective.

Content-based techniques have difficulty in distinguishing between high-quality and low-quality information that is on the same topic. Also, content-based filtering faces the problem of recommendation depends on too much of the active user merely. In this section several previous work combined with both CF and content-based are described.

In [3], they propose a web recommendation system in which the basis of CF and content-based filtering are kept separated. A page, or an item, is represented as a vector which attributes as dimension. The number of attributes is the length of the dimension. After defining a set of features and generating constraints for each feature, it is guaranteed that, under all the constraints, a unique distribution exists with maximum entropy [12]. Each source of knowledge can be represented as features with associated weights. In their model, two sources of knowledge about Web users’

navigational behavior are namely features. The overall features are combined to provide the recommendation. Two sources of information are considered to be the features. One is based on item-level usage pattens and the other is based on item content associations. With the usage pattens, the condition probability of a certain page is used to decide the value of features. The probability are decided regarding to the user ration history. In other hand, the attribute selection method is modified from Latent Dirichlet Allocation. Then they use Variational Bayes technique to estimate each item’s association with multiple “classes”, or “topics”. Since they find that each item shows strong association with one “class”, they assign each item to a single class and then define the values of features.

Latent Dirichlet Allocation is also mentioned in [29] as topic model. In this work, they intend to solve the problem when searching scientific articles, finding relevant paper is difficult in the large on-line archives of scientific articles. They propose the approach combining the merits of traditional collaborative filtering and probabilistic topic modeling in the direction of utilizing the article libraries created by users. The topic modeling is used to generalize the unseen articles through providing a representation of the articles in terms of latent themes discovered from the collection. The topic representation of articles allows the algorithm to make meaningful recommendations about articles before anyone has rated them. In short, an article that has not been seen by many will be recommended based more on its content while an article that has been widely seen will be recommended based more on the other users. Both user and items share the same latent low-dimensional space and are represented by a latent vector. With latent factor models, collaborative topic regression is proposed and represents users with topic interests and assumes that documents are generated by a topic model. Each document comes with an topic

proportion. Then from the topic proportions and the interest of use, collaborative topic regression provides recommendation.

Though approaches combined with CF and content of items reveal someway to work. The rapid grow of information results in the need of recommendation change-able with time. A mechanism is needed to adapt recommendation according to the sequence of history. However, if new recommendation is just produced from more records of history, the effort is expensive because of somehow repeated effort. That is, “new” recommendation modifies from the “past” recommendation with consid-ering only the more recent history instead of all history to save off-line computation burden.

Chapter 3

相關文件