Related Works - 一個新的資料特徵產生方法應用於手機影片推薦之使用者分群之研究

2.1 Collaborative filtering recommendation

The main idea of collaborative filtering (CF) recommendation [16] [11] [25] [10]

is that the similar users will like similar products. The CF approaches try to predict the utility of items for a particular user based on the items previously rated by other users. CF analyzed users’ behaviors and further provided targeted user with the recommendation according to her/his favorite and relationship among others.

Amazon.com [7] developed a CF recommendation system for its bookstore website, where the similar contents were recommended for each content based on the heuristic:

similar contents should have similar buyers.

However, although CF approach could be effective in many cases but still has some drawbacks. There are three main problems in the CF approach. Sparsity problem means that the number of already rating items is very small compared to the number of whole items. This phenomenon will lead other items will not be recommended to users. New user problem means recommendation systems recommend items to a person who has no rating experience in the past. Because the user has no rating record, recommendation system can’t figure out the characteristic of this user, and system can’t identify which cluster the user should belong to. Finally, system would not be able to recommend products similar with the user’s characteristics. For the new item problem, if there is no previous users’ purchase history on this item, the recommendation system would not be able to recommend it.

2.2 Content-based filtering recommendation

In content-based filtering (CBF) [17] [3] [13] recommendation methods, CBF recommends items by means of the contents’ features and users’ preferences identified by their historical chosen items and the current clicked item. The CBF approach to recommendation has its roots in information retrieval [20] and information filtering [18] research. Because the significant and early advancements have been made by the information retrieval and filtering communities, many current content-based systems focused on recommending items containing textual information. There are many studies[24] [8] [22] extracted keywords using the retrieval mechanism, e.g., TF/IDF, and recommended contents with similar keywords to those of users’ previous features, which can improve the recommendation effectiveness by taking the relations of contents features into account.

One of the measurements for specifying keyword weights in Information Retrieval is the term frequency/inverse document frequency (TF/IDF). TF-IDF weight is a statistical measure in evaluating how important the word is in a document. The importance of a keyword increases when the ratio of its frequency of occurrence in the document to that in the corpus.

The content-based techniques are limited by the features that are explicitly associated with the objects that these systems recommend. Another problem is overspecialization problem; in other words, the system can only recommend items that score highly against a user’s profile; the user is limited to be recommended items that are similar to those already rated.

2.3 Hybrid approach

The hybrid approach [26] [13] integrated with CBF and CF concepts becomes the popular mechanism to overcome the problem of pure CF and pure CBF. There are several ways to combine collaborative filtering and content-based filtering methods as hybrid methods.

Content-based collaborative filtering method [1] [14] is one of the most popular hybrid methods in collaboration via content. The main idea of content-based collaborative filtering recommendation is based on traditional collaborative techniques also maintain the content-based profiles for each user. It’s a kind of collaborative technique combining content-based characteristic. CBCF approaches follow the CF same principle like CF approach: similar contents should have similar buyers. In CBCF approaches, the users’ characteristics modeled by product attribute.

So CBCF approaches can overcome the specialization problem of CBF and sparsity problem of CF.

Traditional CBCF[14][15][19] recommendation methods use the tag of contents to model users’ profile, and then recommend products according to the clustering result by the attribute of users’ profile. So the recommendation result is relevant to the user clustering result.

2.4 Feature selection

Feature selection, which selects an appropriate subset of original feature set plays an important role in the Data mining and Machine learning fields. Feature selection is also widely used in supervised learning and unsupervised learning. However, the unsupervised feature selection is relatively difficult. In the unsupervised configuration, denoising is still a major challenge.

The unsupervised feature selection algorithm for denoising can be categorized two frameworks: wrapper and filter [16]. The wrapper framework uses the clustering method to evaluate the quality of feature and obtain the implicit class information.

Information Gain (IG) is one of the criterions of wrapper framework. The information gain of a term measures the number of bits of information obtained for category prediction by knowing the presence or absence of a term in a document.

Although feature selection (FS) for clustering is difficult due to the absence of class labels but FS may lead to more economical clustering algorithms. FS is particularly relevant for data sets with large numbers of features in some applications, such as molecular biology[11] and text clustering applications[25]. Also other approaches as Bayesian approaches for multinomial mixture were proposed in [21]

and [26]. A genetic algorithm was used in [17] for FS in K-means clustering.

To solve the new item problem and overspecialization problem and to achieve the idea of social recommendation, CBCF recommendation method will be applied.

Nevertheless, how to categorize users with similar characteristics is the most important issue in this thesis. According to the observation of folksonomy-based tag of content and user log in the database, in order to reveal users characteristics we propose an attribute generation process to solve the attribute dependency problem in the folksonomy-based tag system.

在文檔中一個新的資料特徵產生方法應用於手機影片推薦之使用者分群之研究 (頁 12-16)