Similarity Approach - 基於機器學習探討音樂多樣性推薦之研究

Table 4.3: Performance of user similarity and item similarity LiveJournal Dataset

Note: For the feature abbreviation, please refer to Table 3.1.

the user context, which might not be easily captured by mood tags only.

Finally, we evaluate the hybrid model that combines the two contextual features and the content-based features (i.e., U + S + Cb + M + VAD). As the last row of Table 4.2 shows, this hybrid model greatly outperforms the content-based method, achieving 0.50 and 0.65 in terms of MAP and recall, respectively. The performance difference between the hybrid model and the CF-based or content-based models are significant under the two-tailed t-test (p-value< 0.001). On the other hand, we also provide an experimental result that without using the user-provided Mood tags (i.e., U + S+ Au + VAD). By comparing it to purely hybrid CF+CB method, we still see great performance improvement. In sum, the experimental results suggest that the contextual information mined from user-generated articles improves the quality of music recommendation.

4.3 Similarity Approach

The similarity indicator can be represented as the categorical set domain as used in [21].

For instance, suppose that ”Alice is similar to Charlie and Sandy”, the corresponding similarity indicator may be the vector z(Bob, Charlie, Sandy) = (0, 0.2, 0.8), where the sum of all values equals to 1 according to Equation (3.7). Below we investigate the effectiveness of similarity information under different types of extracted features.

4.3.1 User Similarity and Item Similarity

Under the CF-based framework, there are two ID indicators: User ID and Song ID. We can obtain the following similarity information according to Equation (3.5):

•‧

• User Similarity (US): Two users are similar if they listen to same songs.

• Song Similarity (SS): Two songs are similar if they are listened by same users.

Both of them are directly mined from the listening history. Therefore, they are always available for a standard recommendation problem. US is applied to users, whereas the SS is applied to items.

We evaluated the performance on every possible feature combination. As shown in Table 4.3, both the user similarity and the song similarity (U+S+US or U+S+SS) lead to significantly better result, comparing to the baseline U+S.

We have also implemented KNN-based FM of [1] by adding the listening history to libFM, as can be seen from the second row of Table 4.3 (i.e., U+S+H). It can be seen that the incorporation of listening history (‘H’) generally improves the result as well. If we compare H, US, and SS, SS achieves the highest MAP@10 (0.4635), showing that the similarity approach is more effective than the KNN approach does. Unlike listening history, similarity is extracted by a meaningful computation. Moreover, KNN approach may fails when the amount of listening history is limited.

By combining all the available information from the listening records (U+S+US+SS+H), we obtained the best result 0.5021 in MAP@10 in Table 4.3, which is significantly bet-ter than the baseline 0.3816. It should be noted this result indicates an improvement in MAP@10 by more 0.1205, which is a very remarkable improvement. A simple idea as it is, using the proposed ID similarity indicators holds the promise of greatly improv-ing the accuracy of recommendation. Moreover, the ID similarity indicators are suitable for other recommendation problems because they are in the same problem structure: to predict whether an item would be accepted by a user.

4.3.2 Content-based feature similarity

With regard to content-based feature, there are four similarity features were extracted from the dataset:

• Birth Year Similarity (BYS): Two users are similar if they are born in the same year.

• Live Region Similarity (LRS): Two users are similar if they live in the same region geographically.

• Artist Similarity (AS): Two songs are similar if they are sung by the same artist.

• Audio Similarity (AuS): Two songs are similar if they are close in the audio feature space spanned by the 53 audio features considered in this work.

•‧

Table 4.4: Performance of Content-based Feature Similarity

Features MAP@10 Recall

Table 4.5: Performance of Context-based Feature Similarity

Features MAP@10 Recall

Note that BYS and LRS are personal information that is not always available for a recom-mendation problem. Similarly, AS and Aus are musical information that is only available if we have access to the metadata or the audio content of the songs.

Table 4.4 lists the improvement introduced by the use of feature similarity. The results show that four similarities perform well in recommendations. Among the four similari-ties, Birth Year Similarity cannot obtain a significant improvement in the experiments.

This is possibly due to incompleteness of the metadata, because only half users have birth year information in our dataset. Moreover, another interesting observation is that the au-dio feature has a great enhancement on the recommendation performance after the auau-dio similarity is added. The result implies that the abstract information such as the audio feature is hard to be organized directly, but its similarity information provides insightful information.

4.3.3 Context-based feature similarity

We evaluated context-based recommendations by using Mood Tag and Emotional Words.

These two features reflect the user’s mood when writing the article. We want to carry

•‧

Figure 4.2: An example for explaining different grouping method

the emotional information from user-generated articles and mood tags. The similarity information can be obtained in the same way:

• Mood Similarity (MS): Two user are similarly if they tend to express similar moods in their articles.

• VAD Similarity (VADS): Two users are similar if the affective qualities of the articles they wrote are similar.

Please note that contextual information is also an advanced feature that is not always avail-able for a recommendation problem. However, whenever it is possible to model context information, it is usually advisable to do so. We only considered context information ex-tracted from mood tags and articles in this work, but it should be noted that the proposed method applies to other contextual information as well.

As the first and third rows of Table 4.5 shows, the performance of adding the Mood Tags feature is 0.4134 in terms of MAP@10, which is higher than the contextual VAD feature computed from user-generated articles. This result indicates that the VAD feature provides more affective information of the user context. Although the mood similarity does not lead to remarkable improvement, the VAD similarity feature is still considered effective.

在文檔中基於機器學習探討音樂多樣性推薦之研究 - 政大學術集成 (頁 37-40)