整合社群資訊於學習排序模型之推薦系統

(1)

行政院國家科學委員會專題研究計畫期末報告

整合社群資訊於學習排序模型之推薦系統

計畫類別：個別型

計畫編號： NSC 101-2221-E-004-017-

執行期間： 101 年 08 月 01 日至 102 年 07 月 31 日

執行單位：國立政治大學資訊科學系

計畫主持人：蔡銘峰

計畫參與人員：碩士班研究生-兼任助理人員：陳志明

碩士班研究生-兼任助理人員：陳禔多

碩士班研究生-兼任助理人員：劉澤

報告附件：出席國際會議研究心得報告及發表論文

公開資訊：本計畫可公開查詢

中華民國 102 年 10 月 31 日

(2)

中文摘要：近年來由於網站上社群媒體的盛行，導致於網路上的資訊量

迅速增加，因此如何有效率地找尋資訊已經變得十分重要。

而社群推薦系統(Social Recommendation System)在現今的

社群網路中也已經變成一個不可或缺的重要工具。如何有效

率地推薦使用者有興趣的資訊已經成為近年來資訊相關科學

研究中十分重要的研究議題。推薦系統的研究亦可視為資訊

檢索研究之分支，其主旨在於如何根據一些面向資訊來建立

模型，此模型可以用來預測使用者對於某些物品的喜好或是

等級評定。推薦模型的建立傳統有二種主流的作法：一、利

用協同過濾技術(Collaborative Filtering Approach)，

如：矩陣分解 (Matrix Factorization)，來根據相似的使用

者行為過濾資訊；二、利用內容分析技術(Content-based

Approach)，如:向量模型(Vector Space Model)，來計算推

薦物品間的相似度以進行推薦。

因此，在此計畫中我們計畫利用機器學習技術來進行推薦系

統模型習得的研究。本計畫主要完成項目為二項：一、利用

機器學習方式建立推薦模型，此模型已成功應用在音樂推薦

上，並且結合使用者方面資訊改善推薦之效能，此項成果也

於計畫執行期間發表二篇相關國際論文。二、建置手機應用

程式推薦平台，此平台目前已有 2330 位使用者並且有 1278

個手機應用程式，未來會持續收集使用者資料，並且在此資

料集上進行社群推薦學習演算法之改進與研究。

中文關鍵詞：資訊檢索、機器學習、自然語言處理、社群網路與分析

英文摘要： Social networking websites have becoming more and

more popular in recent years. On these websites,

there are hundreds of millions of active users

creating vast information, which has not been

available before. Such vast information, therefore,

poses a great challenge in terms of information

overload. Social Recommendation Systems is mainly to

reduce the information overload over social

networking websites by presenting the most relevant

information to users. Recommendation systems can also

be considered a subclass of information ranking

systems that aim to use a model built from the

characteristics of an item (content-based approaches)

or a user＇s social information (collaborative

filtering approaches) to predict the ＇rating＇

or ＇preference＇ that the user would give to the

(3)

item. Most of previous work uses matrix

factorization, a typical collaborative filtering

technique, to handle users＇ social information such

as behavior and activity.

This work attempts to use machine learning techniques

to deal with such a preference problem. Two major

contributions of this project has been accomplished:

First, we have used machine learning techniques to

construct recommendation models, and applied the

models on music recommendation, in which user

information is employed to improve the performance.

Second, we have built up a recommendation platform on

mobile apps. In the platform, we have collected 2330

users with 1278 various mobile apps. In the future,

we will continue to collect more data, and conduct

research on the dataset to enhance our social

recommendation algorithms.

英文關鍵詞： Information Retrieval, Machine Learning, Natural

Language Processing, Social Network and Analysis

(4)

1 The Background and Goals of the Research Project

��

1.1 Background

Social networking websites have becoming more and more popular in recent years. Well-known

examples include photo and video sharing websites like Flickr and YouTube, blog and wiki websites

like Blogger and Wikipedia. In addition, there are also social tagging websites like Declicious, and

social network websites like Facebook, and micro-blogging websites like Twitter. On these websites,

there are hundreds of millions of active users creating vast information, which has not been available

before. Such vast information, therefore, poses a great challenge in terms of information overload.

Social Recommendation Systems is mainly to reduce the information overload over social

net-working websites by presenting the most relevant information to users. In [18], Resnick and Varian

claim that in a typical recommender system, people provide recommendations as inputs, which the

system then aggregates and directs to appropriate recipients. In some cases the primary

transforma-tion is in the aggregatransforma-tion; in others the system’s value lies in its ability to make good matches between

the recommenders and those seeking recommendations. Recommendation systems are systems that

are meant to augment this process in cases where we need help. Examples of such instances include:

gathering many like-minded people in a database, gathering information about thousands of books,

combining information in guides far more than we can research, etc.

For a typical recommender system, there are three steps:

1. The user provides some form of input to the system. These inputs can be both explicit and

implicit [17]. Ratings submitted by users are among explicit inputs whereas the URLs visited

by a user and time spent reading a web site are among possible implicit inputs.

2. These inputs are brought together to form a representation of the user’s likes and dislikes. This

representation could be as simple as a matrix of items-ratings, or as complex as a data structure

combining both content and rating information.

3. The system computes recommendations using these “user profiles.”

Even though the steps are essentially the same for most recommender systems, there have been

differ-ent approaches to both step 2 and 3. Two of the traditional approaches to building a user profile and

computing recommendations are collaborative Filtering and content-based recommendation. Some

researchers also tried hybrid approaches to improve the quality of the recommendations.

In general, recommendation systems produce a list of recommendations in one of the two ways

-through collaborative filtering or content-based approach. Collaborative filtering techniques construct

a model from a user’s past behavior (i.e., items previously purchased or selected) and similar decisions

made by other users, then use that model to recommend items that the user may have an interest in

[19]. Content-based approaches utilizes a set of discrete attributes/features of an item in order to

recommend additional items with similar properties [12, 13]. In addition, these two approaches can

also be combined as a hybrid recommendation system. In the context of content-based approaches,

recommendation systems can also be considered a subclass of information ranking systems that aim

to use a model built from the characteristics of an item or a user’s social information to predict the

“rating” or “preference” that the user would give to the item. From the perspective of predicting

“rating” or “preference”, in recent years many machine learning techniques have been studies for the

problem of Learning to Rank [10, 21]. Below we take a brief survey on these techniques in terms of

their pros and cons.

(5)

1.2 Related Work

1.2.1 Collaborative Filtering

Collaborative filtering technique is one of the approaches to the design of recommender systems that

has been widely used. Collaborative filtering approaches are based on collecting and analyzing a large

amount of information on the behaviors, activities or preferences among users, and then predicting

what users will like based on their similarity to other users [5], which is called user-based

collabora-tive filtering. So, user-based collaborating filtering attempts to the social process of asking a friend for

a recommendation. Another of the most famous example of Collaborative Filtering is item-based

col-laborative filtering, an algorithm popularized by the Amazon recommender system [9]. The main idea

behind the item-based collaborative filtering is that similar items may be bought by similar people.

Within these two different kind of collaborative filtering techniques, there are two commonly-used

metrics to measure the similarity between two items/users: one is cosine-based similarity, and the

other one is Pearson based similarity, as shown in the following two equations, respectively:

w(u, v) =

�

i∈I

r

u,i

r

v,i

��

i∈I

r

2u,i

��

i∈I

r

v,i2

(1)

w(u, v) =

�

i∈I

(r

u,i

− r

u

)(r

v,i

− r

v,i

)

��

i∈I

(r

u,i

− r

u

)

2

��

i∈I

(r

v,i

− r

v

)

2

(2)

where u and v stand for two users, i and j for two items, r

u,i

for the rating given by user u to item i,

and r

u

for the average rating given by user u.

The main advantage of the collaborative filtering approach is that it does not rely on machine

analyzable content and therefore it is capable of accurately recommending complex items such as

movies without requiring an “understanding” of the item itself. However, due to the advantage, the

approach often suffer from the following three problems: cold start, scalability, and sparsity [2, 8].

• Cold Start: The systems often require a large amount of existing data on a user in order to make

accurate recommendations.

• Scalability: In many of the social recommendation environments, there are usually millions of

users and products. Thus, a large amount of computation power is often necessary to calculate

recommendations.

• Sparsity: The number of recommended items on major sites is extremely large. The most active

users will only have rated a small subset of the overall database. Thus, even the most popular

items have very few ratings.

In [16, 20], a particular type of collaborative filtering algorithm uses matrix factorization, a

low-rank matrix approximation technique, to predict recommendations. Most of the collaborative filtering

systems apply the nearest neighbor model for computing recommendations; the systems that use the

nearest neighbor model rely upon the assumption that people who agreed in the past are likely to

agree in the future as well [17].

1.2.2 Content-based Approach

based approach is another typical approach when designing recommender systems.

Content-based filtering methods are Content-based on information and characteristics of the items that are going to be

recommended. These algorithms try to recommend items that are similar to those that a user liked in

the past. In particular, various candidate items are compared with items previously rated by the user

(6)

and the best-matching items are recommended. This approach has its roots in information retrieval

and information ranking research.

Content-based systems recommend items based on items’ content rather than other users’ ratings.

There are essentially four steps for content-based recommendations:

1. The first step is to gather content data about the items. Most systems use Information Extraction

techniques to extract these data, and Information Retrieval techniques to retrieve the relevant

information [3, 12]. Web crawlers collecting data off the web are common tools in this step.

2. The second step is to ask the user to provide some ratings. In this step, the user might be asked

to rate random items, or can search and find any books that the user likes.

3. The third step is to compile a profile of the user using the content information extracted in

the first step and the rating information provided in the second step. Different information

retrieval or machine learning algorithms can be used to learn a profile.

Term-frequency/inverse-document frequency weighting [11] and the Bayesian learning algorithm [14] are some of the

many techniques that have been tried.

4. The last step is to match unrated books’ contents with the user profile compiled in the third step

and assigning scores to the items depending on the quality of the match. The items are ranked

according to their scores and presented to the user in order [12].

Basically, these methods use an item profile (i.e., a set of discrete attributes and features)

charac-terizing the item within the system. This part may also need some content analysis techniques such

as natural language processing. The system then creates a content-based profile of users based on a

weighted vector of item features. The weights denote the importance of each feature to the user and

can be computed from individually-rated content vectors using a variety of machine-learning

tech-niques. Simple approaches use the average values of the rated item vector while other sophisticated

methods use machine learning techniques such as Bayesian Classifiers, cluster analysis, decision

trees, and artificial neural networks in order to estimate the probability that the user is going to like

the item. However, content-based systems also suffer from the following three problems:

• For some domains, either there is no content information available, or the content is hard to

analyze.

• Formulating taste and quality is not an easy task.

• These systems can suggest only items whose content match with the user’s profile. If the user

has tastes that are not represented in user’s profile, items talking to the unrepresented taste will

not be recommended.

Seeing that one disadvantage of a system is not an issue for another, some research has done

on combining collaborative filtering with content-based recommendation. Different techniques were

employed to combine the two, called “hybrid approaches.”

1.2.3 Hybrid Approaches

Recent studies show that hybrid methods consisting of collaborative filtering and content-based

ap-proach can be more effective in some cases. Hybrid apap-proach can be achieved in various ways

such as combining content-based and collaborative-based predictions. Some empirical studies show

that a hybrid approach can provide more accurate than the pure approach based on content-based or

collaborative-based system. These methods can also be alleviate some problems, such as cold start

and sparseness problem.

(7)

2 Methodology

��

2.1 Standard FM

Factorization Machines can act like most factorization models by feeding various types of features.

It learns the weights of all interactions between the features. In general, a two-way factorization

machine model can be defined as:

ˆ

y(x) = w

0

+

n

�

i=1

w

i

x

i

+

n

�

l=1 n

�

j=l+1

ˆ

w

lj

x

l

x

j

,

(3)

where w

0

is the global bias, w

l

is the weight of features x

l

, and w

lj

models the interaction of each pair

of features. The interaction w

lj

can be factorized into pairs of interaction parameters,

ˆ

w

lj

=

κ

�

f =1

v

lf

v

jf

.

(4)

The parameter κ determines the model complexity. Rather than only using single parameter for each

interaction, this way allows high quality parameters estimated by higher-order interactions under

spar-sity. Factorization Machine provides a promising framework for recommendation problem. Unlike

the generic matrix factorization model, it can be easily used to conduct feature engineering. For more

details of FM, please refer to [1].

2.2 Similarity Computation

Motivated by the strength and efficiency of CF method, we seek to combines the advantages with the

factorization model. Since FM has a good framework for modeling the input features, we can directly

extract the similarity information from the users and items. This is similar to CF methods, and can

be easily embedded into a feature vector. In general, the utilized features are divided into following

three types, and each type has its own computation method.

1. ID Domain: The ID variable is used to identify a target, and it only belongs to a specific target.

For instance, User ID is in the ID domain, which means that each user has his/her own unique

ID variable. Technically a similarity measurement is a function that computes the degree of

similarity between a pair of targets, e.g. the similarity of listening histories of two users. Given

two vectors of attributes, A and B, the similarity score is computed by the extended version of

cosine similarity:

similarity =

A

∩ B

|A|

1−α

_|B|

α

,

(5)

where α ∈ [0, 1] is a tuning parameter.

2. Categorical Domain: The categorical variable represents the extracted features from the user

and item attributes such as the User Age and Music Genre. The similarity computation is also

based on Equation 5.

3. Real Value Domain: If the attribute is already a number ∈ R, such as Audio Information. The

similarity score is calculated by the Euclidean distance. In general, for an n-dimensional space,

(8)

the distance between feature vector q and feature vector p is:

d(p, q) =

�

n i=1

(p

i

− q

i

)

2

.

(6)

For the ID domain, the function O represents the referred objects from target i and target j. For

example, given the listening histories of two users the α determines whether the similarity score

considers the amount of referred objects from another target or not. Take the following three users

with the listening records as an example:

O(U ser

i

) = [1, 2, 3],

O(U ser

j

) = [1, 2, 3],

O(U ser

k

) = [1, 2, 3, 4].

Then User

j

is more similar to User

i

than User

k

based on the listening history while the α = 1; on

the other hand, they will get a same score while the α = 0.

For the categorical indicators, because this kind of feature usually occurs in different objects, the

function O will be the collection of referred objects for a target. Take the User Age as an example, if

we want to know the similarity of listening history between 15-year-old users and 30-year-old users,

the function O will collect all the songs of the users whose age is between 15 and 30.

For the real-value indicators, the feature vector is normalized by the standard score:

x−µ

σ

, where

µ

is the mean of the population and σ is the standard deviation of the population. The score indicates

how many standard deviations an observation is above or below the mean.

Finally suppose we have a set of similarity scores for a specific target and seek to embed them

into a feature vector, a simple way is to directly index them with corresponding scores. However, the

popular object generally contains more similar objects than the others. It may leads to an unbalance

problem that unpopular objects are hard to get the similarity score. In order to take the balance issues

into account, we only keep the top-k similar objects as the new score basis, and normalize the new

vector of k values to 1:

¯

s

ij

=

s

ij

�

n j�=1

|s

ij�

|

.

(7)

The purpose of this step is to avoid the unbalance of similarity information. For example, s(User

i

) =

(0, 0.8, 0.6)

and s(User

j

) = (0.1, 0, 0.2), User

i

will have more probability of getting high scores

because of the high values of the similarity vector.

3 Experimental Setup

This section describes the experimental setup we employed to study the influence of different factors

on the performance of music recommendation.

3.1 Evaluation Metric

We employed two metrics to evaluate the recommendation performance: the truncated mean average

precision at k (MAP@k) and recall. For each user, let P (k) denotes the precision at cut-off k:

AP (u, o) =

�

k

p=1

P (k)

× r

uo(p)

I(u)

,

(8)

(9)

Table 1: The feature sets considered in this work.

abbr. Feature

Unique Index Type

U

User ID

19,596

-S

Song ID

30,260

-H

Listening History

30,260

-BY

Birth Year (of users)

100 Cb

LR

Live Region (of users)

208 Cb

M

Mood Tags (of users)

132 Cx

VAD VAD values (of articles)

3 Cx

A

Artists (of songs)

5,175 Cb

Au

Audio Information

53 Cb

SR

Social Relation

674,932 Cx

Note: P denotes the feature of

user profile, Cb denotes the

content-based feature that are extracted

from songs, and Cx denotes the

context-based feature that are

ex-tracted from user.

Figure 1: Livejournal sample posts.

where o(p) = i describes the item i is ranked at position p in the order list o, and r

ui

means whether

the user u has listened to song i or not(1 = yes, 0 = no). MAP@k is the mean of the average

precision scores for the top-k results:

M AP @k =

�

U

u=1

AP (u, o)

U

,

(9)

where U is the total number of target users. Higher MAP@k indicates better recommendation

accu-racy.

Recall measures how many songs the user really likes are recommended by the automatic system.

It is computed by:

Recall =

|{Correct Songs}| ∩ |{Returned T op k Songs}|

|{Correct Songs}|

.

(10)

High recall means that most of songs the user actually likes or listens to are recommended.

3.2 Dataset

Our experiments are performed on a real-world dataset collected from a well-known social blogging

websites – LiveJournal. LiveJournal is unique in that, in addition to the common feature of blogging,

each post is accompanied with a “Mood” column and a “Music” column so that users can write down

(10)

!"#$%&"&'( )"#$*+,( (-.#"/01( 20/$*+#( 3"$.04*&01( 20/$*+#( 5#%+(6+*71%( 3*/"01( 8&9*+:04*&( 5#%+(;+4/1%#( !*/04*&( 5#%+(-**<#( -.#"/(6+*71%( ;.<"*( 8&9*+:04*&( -.#"/(=:*4*&( 6%+#*&01( 20/$*+#(

Figure 2: The structure of LiveJournal dataset

their moods and songs in their minds while posting, as Figure 1 exemplifies. From LiveJournal, we

crawled a total number of 1,928,868 listening records covering 674,932 users and 72,913 songs as an

initial set. For the purpose of retaining enough number of data in the training and test sets for this

study, we only considered users who have more than 10 listening records and discarded the records

of the other users. This filtering resulted in the final set of 225,652 listening records (11.7% of the

initial set) among 19,596 users and 30,260 songs.

For evaluation, we split the dataset for each user according to the following 80/20 rule: keeping

full listening history for the 80% and the half of listening history for the remaining 20% users as the

training data, and the other half of the remaining 20% users as the testing data. For each record, we

randomly add 10 songs as negative records to construct the testing pool.

3.3 Feature

The structure of collected music dataset is depicted in Figure 2, as these factors affect how people

choose the music. Personal factors indicate the characteristics that people would possess for a long

period of time, such as age and gender. People with different levels of music background may

appre-ciate music differently, which in turn affects music preference. Musical factors consist of the audio

content, its profile, and even the artwork of the CD. People may choos a song because its melody or

the singer. Situational factors include those that persist for a short period of time such as when and

where you listen to music, what you are doing and what your mood is. People often express their

feelings through listening to music, and the user-generated article reflects their recent mood.

Table 1 summarizes the features used in the experiments, which are described in detail below.

3.3.1 Content-based Features

Content-based features refer to features that describe either the user or the item. For describing users,

we have Birth Year (BY), Live Region (LR) and Social Relations (SR) features. The birth years for

the users in our dataset fall in a window of 100 years. Moreover, the users are from 208 regions. We

consider users who were born in the same year or users who were from the same region as similar. On

the other hand, from LiveJournal we can obtain friendship and construct the social network among

the users. This gives rise to the social relation based similarity matrix. People who are friends to one

another are likely to share similar music taste.

For describing songs, we have Artist (A) and Audio Information (Au) features. The artist

fea-ture simply indicates the artist (among the 5,175 possible artists) of the songs. If two songs are

(11)

Table 2: Affective Norms for English Words

Description Valence Arousal Dominance

dream

6.73

4.53

5.53 eat

7.47

5.69

5.60 favor

6.46

4.54

5.67 good

7.47

5.43

6.41 hate

2.12

6.95

5.05 Note: 5 example words of

ANEW dictionary.

sung/performed by the same artist, they are likely to be more similar. The audio features consists

of 53 perceptual dimensions of music, including danceability, loudness, mode, and tempo. They are

extracted by using the EchoNest API

1

_{, a commonly used audio feature extraction tool developed in}

the field of music information retrieval [6]. We can measure the similarity between two songs in this

53-dimensional feature space.

3.3.2 Context-based Features

The user-generated articles are interesting context-based features in the dataset, but it may contains

too many redundant words. Motivated by the idea of emotional matching, we convert the original

content of an article into a vector of emotional words by referring to the dictionary of Active Norms

for English Words (ANEW) [4], which provides a set of normative emotional ratings for English

words. We retain the words which can be found in the ANEW dictionary and weight them by the

TF-IDF weighting. Specifically, a word is scored by tf(t, d) × idf(t, d), where

tf (t, d) =

f (w, d)

max

_{{f(w, d) : w ∈ d}}

,

(11)

idf (t, d) = log

|D|

|{d ∈ D : t ∈ d}|

,

(12)

and D is the set of all articles. A term with higher score indicates that the term has a higher term

frequency wight and a lower document frequency of the term in the whole collection of articles.

In addition, the ANEW dictionary also provides a set of normative emotional ratings for English

words. The emotional words are rated by Valence (or pleasantness; positive/negative active states) ,

Activation (or arousal; energy and stimulation level) and Dominance (or potency; a sense of control

or freedom to act), the fundamental emotion dimensions found by psychologists [7]. Finally each

word vector of articles is converted to valence, arousal, and dominance (VAD) values. For example,

for the sentence ”I had a dream last night, I was eating a marshmallow,” the VAD values would be

14.2, 10.22, and 11.13, respectively, according to Table 2. Moreover, we also collected the recent

mood tags which are recent used by each user.

We conducted a series of experiments with different settings. First of all, we attempted to

demon-strate the similarity information is effective on most kinds of features under the factorization model.

Second, we compared the performance of the standard Factorization Machine with that of the

group-ing Factorization Machine, and then examined the effects of different feature combinations. Finally,

we studied the sensitivity of the proposed method to the parameters

1_{http://echonest.com/}

(12)

Table 3: Evaluation result of CF-based algorithms

Model

MAP@10 Recall

Randomize

0.0578

0.1656

User-based CF 0.3668

0.4748

Item-based CF 0.3093

0.5115

SVD++

0.3506

0.4844

FM

0.3817

0.5216

Table 4: Performance of ID Similarity

LiveJournal Dataset

Features

MAP@10 Recall

U + S

0.3816

0.5217

U + S + H

0.4409

0.5821

U + S + US

0.4310

0.5712

U + S + H + US

0.4427

0.5810

U + S + SS

0.4635

0.6194

U + S + H + SS

0.4897

0.6413

U + S + US + SS

0.4712

0.6251

U + S + US + SS + H 0.5021

0.6491

Note: For the feature abbreviation,

please refer to Table 1.

3.4 Similarity Approach

The similarity indicator can be represented as the categorical set domain as used in [15]. For instance,

suppose that ”Alice is similar to Charlie and Sandy”, the corresponding similarity indicator may be

the vector z(Bob, Charlie, Sandy) = (0, 0.2, 0.8), where the sum of all values equals to 1 according

to Equation 7.

3.4.1 CF-based Recommendation

In the first step, we evaluated the performance on some well-known CF-based Recommendation

al-gorithms to verify the strength of factorization machine. We compare FM with user-based CF,

item-based CF, and a SVD-item-based approach using only the user-item matrix, which is a standard input to

recommendation models. Note that context information or similarity information is not exploited in

this comparison. Table 3 lists the result of these methods. As the table shows, the resulting MAP

of all the CF-based approaches fall within 0.30–0.38. Among the four methods, FM performs the

best. The performance difference between FM and other methods is significant under the t-test. This

validates the effectiveness of FM. Therefore, we employed FM in the subsequent experiments.

Under the CF-based framework, there are two ID indicators: User ID and Song ID. Therefore, we

can obtain the following similarity information according to Equation 5:

• User Similarity (US): Two users are similar if they listen to the same songs.

• Song Similarity (SS): Two songs are similar if they are listened by the same users.

Both of them are directly mined from the listening history. Therefore, they are always available for a

standard recommendation problem. US is applied to users, whereas the SS is applied to items.

(13)

Table 5: Performance of Feature Similarity

Features

MAP@10 Recall

U + S + BY

0.4301

0.5751

U + S + BY + BYS 0.4348

0.5830

U + S + A

0.5025

0.6538

U + S + A + AS

0.5125

0.6640

U + S + LR

0.4283

0.5723

U + S + LR + LRS

0.4382

0.5834

U + S + Au

0.4254

0.5809

U + S + Au + AuS

0.4576

0.6114

We evaluated the performance on every possible feature combination. As shown in Table 4, both

the user similarity and the song similarity (U+S+US or U+S+SS) lead to significantly better result,

comparing to the baseline U+S.

We have also implemented KNN-based FM of [1] by adding the listening history to libFM, as

shown in from the second row of Table 4 (i.e., U+S+H). It can be seen that the incorporation of

listening history (’H’) generally improves the result as well. Note that the SS feature is the top-k

most similar music which is not extracted from listening history. If we compare H, US, and SS, SS

achieves the highest MAP@10 (0.4635), showing that the similarity approach is more effective than

the KNN approach is. Moreover, KNN approach may fail when the amount of listening histories is

limited or overwhelmed, but it is easy to determine the number of most similar features used in the

whole data.

By combining all the available information from the listening records (U+S+US+SS+H), we

ob-tained the best result 0.5021 in MAP@10 in Table 4, which is significantly better than the baseline

0.3816. A simple idea as it is, using the proposed ID similarity indicators greatly improve the

accu-racy of recommendation. Moreover, the ID similarity indicators are suitable for other

recommenda-tion problems because they are in the same problem structure: to predict whether an item would be

accepted by a user.

3.4.2 Content-based Recommendation

Four similarity features were extracted from the dataset:

• Birth Year Similarity (BYS): Two users are similar if they are born in the same year.

• Live Region Similarity (LRS): Two users are similar if they live in the same region

geograph-ically.

• Artist Similarity (AS): Two songs are similar if they are sung by the same artist.

• Audio Similarity (AuS): Two songs are similar if they are close in the audio feature space

spanned by the 53 audio features considered in this work.

Note that BYS and LRS are personal information that is not always available for a recommendation

problem. Similarly, AS and Aus are musical information that is only available if we have access to

the metadata or the audio content of the songs.

Table 5 lists the improvement introduced by the use of feature similarity. The results show that

four similarities perform well in recommendations. Among the four similarities, Birth Year

Similar-ity cannot obtain a significant improvement in the experiments. This is possibly due to the

incom-pleteness of the metadata, because only half of the users have birth year information in our dataset.

(14)

Table 6: Performance of Feature Similarity

Features

MAP@10 Recall

U + S + M

0.4134

0.5539

U + S + M + MS

0.4202

0.5652

U + S + VAD

0.4483

0.5905

U + S + VAD + VADS 0.4511

0.5935

U + S + SRS

0.4213

0.5653

Table 7: Performance on complete feature vector.

Features

MAP@10 Recall Note

U + S

0.3817

0.5216 Base-line

U + S + C*

0.5120

0.6614

U + S + C* + S* 0.5236

0.6684 Hybrid

Note: C* denotes all the

categori-cal features, and S* denotes all the

extracted similarity features.

Moreover, another interesting observation is that the audio features significantly enhance on the

rec-ommendation performance after the audio similarity is added. The result implies that the abstract

information such as the audio feature is hard to be organized directly, but its similarity information

provides insightful information.

3.4.3 Context-based Recommendations

Next, we evaluated context-based recommendation by using Mood Tag and Emotional Words. These

two features reflect the user’s mood when writing the article. We want to utilize the emotional

infor-mation from user-generated articles and mood tags. The similarity inforinfor-mation can be obtained in the

same way:

• Mood Similarity (MS): Two user are similarly if they tend to express similar moods in their

articles.

• VAD Similarity (VADS): Two users are similar if the affective qualities of the articles they

wrote are similar.

Note that contextual information is also not always available for a recommendation problem. We only

considered context information extracted from mood tags and articles in this work, but the proposed

method is also applicable for other contextual information as well.

As the first and third rows of Table 6 shows, the performance of adding the Mood Tags feature is

0.4134 in terms of MAP@10, which is lower than the contextual VAD feature computed from

user-generated articles. This result indicates that the VAD feature provides more affective information of

the user context. Although the mood similarity does not lead to remarkable improvement, the VAD

similarity feature is still considered effective.

3.4.4 Hybrid Recommendation

Finally, we studied if we can further boost the accuracy by integrating all the proposed similarity

features, including categorical ones (denoted as C* collectively) and similarity features (denoted as

S* collectively). As Table 7 shows, using more data generally leads to better accuracy. When all

(15)

Table 8: Statistics of the collected data.

# (Apps)

1,278

# (effective user)

2,330

# (commenting users) 20

Figure 3: Platform interface: front page

the features are considered (U+S+C+S), we are able to obtain 0.5251 in MAP@10 and 0.6708 in

recall, both of which are the highest ones in our evaluation. This result confirms again the ability of

the proposed method in incorporating multiple similarity information.

4 Extended Work:

Mobile Apps Recommendation Platform

Besides the above music recommendation work, this project also builds up a Mobile Apps

recom-mendation platform. This platform is still in a preliminary stage, the main goal of which is to collect

the data from users. Table 8 lists some statistics of the collected data. In the table, the effective user

denotes the user whose facebook id does exist and the commenting user denotes that the user among

the 2330 effective user has commented on some Apps. Figures 3-5 display part of the interfaces of

our platform.

5 Conclusions and Future Work

This project proposes a novel approach that incorporates multiple feature similarity to factorization

model via feature engineering. The similarity computation captures the similar patterns from the

objects and enhances the convergence speed and accuracy of FM. The proposed method is applicable

to many kinds of features, which means we can obtain the higher level information from multiple

aspects. The experimental results show that feature similarity indeed benefits the recommendation

(16)

Figure 4: Platform interface: user page

Figure 5: Platform interface: user comments

(17)

performance. In addition, we also propose several features, including CF-based, content-based and

context-based ones. Among these features, we try to capture the relationship between users and

songs by matching users’ emotions. The results show that the idea is able to enhance the quality of

recommendations.

In the aspect of the extended work of this project, we plan to utilize machine learning techniques

to integrate the social relations between users to build up the recommendation algorithms on the

collected data from the Mobile Apps recommendation platform in the future.

References

[1] Factorization machines with libfm. ACM Trans. Intell. Syst. Technol., 3(3):57:1–57:22, May

2012.

[2] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a

sur-vey of the state-of-the-art and possible extensions. Knowledge and Data Engineering, IEEE

Transactions on, 17(6):734 – 749, june 2005.

[3] Marko Balabanovi´c and Yoav Shoham. Fab: content-based, collaborative recommendation.

Commun. ACM, 40:66–72, March 1997.

[4] M. Bradley and P. J. Lang. Affective norms for english words ANEW: Instruction manual and

affective ratings. Technical report, The Center for Research in Psychophysiology, Univ. Florida,

1999.

[5] R. Burke. Hybrid recommender systems: Survey and experiments. User modeling and

user-adapted interaction, 12(4):331–370, 2002.

[6] Douglas Turnbull Derek Tingle, Youngmoo E. Kim. Exploring automatic music annotation with

acoustically-objective tags. pages 55–61, 2010.

[7] Douglas Turnbull Derek Tingle, Youngmoo E. Kim. Exploring automatic music annotation with

acoustically-objective tags. pages 55–61, 2010.

[8] S. Lee, J. Yang, and S.Y. Park. Discovery of hidden similarity on collaborative filtering to

overcome sparsity problem. In Discovery Science, pages 396–402. Springer, 2004.

[9] G.D. Linden, J.A. Jacobi, and E.A. Benson. Collaborative recommendations using item-to-item

similarity mappings, July 24 2001. US Patent 6,266,649.

[10] Tie-Yan Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3:225–331,

March 2009.

[11] P. Melville, R.J. Mooney, and R. Nagarajan. Content-boosted collaborative filtering for

im-proved recommendations. In Proceedings of the National Conference on Artificial Intelligence,

pages 187–192. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999,

2002.

[12] Raymond J. Mooney and Loriene Roy. Content-based book recommending using learning for

text categorization. In Proceedings of the fifth ACM conference on Digital libraries, DL ’00,

pages 195–204, New York, NY, USA, 2000. ACM.

[13] M. Pazzani and D. Billsus. Content-based recommendation systems. The adaptive web, pages

325–341, 2007.

(18)

[14] M.J. Pazzani, J. Muramatsu, D. Billsus, et al. Syskill & webert: Identifying interesting web

sites. In Proceedings of the national conference on artificial intelligence, pages 54–61, 1996.

[15] Steffen Rendle, Zeno Gantner, et al. Fast context-aware recommendations with factorization

machines. In Proc. ACM SIGIR, pages 635–644, 2011.

[16] Jasson D. M. Rennie and Nathan Srebro. Fast maximum margin matrix factorization for

collab-orative prediction. In Proceedings of the 22nd international conference on Machine learning,

ICML ’05, pages 713–719, New York, NY, USA, 2005. ACM.

[17] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl. Grouplens:

an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM

conference on Computer supported cooperative work, CSCW ’94, pages 175–186, New York,

NY, USA, 1994. ACM.

[18] Paul Resnick and Hal R. Varian. Recommender systems. Commun. ACM, 40:56–58, March

1997.

[19] C. Sammut and G.I. Webb. Recommender Systems, Encyclopedia of machine learning.

Springer-Verlag New York Inc, 2011.

[20] Gábor Takács, István Pilászy, Bottyán Németh, and Domonkos Tikk. Scalable collaborative

filtering approaches for large recommender systems. J. Mach. Learn. Res., 10:623–656, June

2009.

[21] A. Trotman. Learning to rank. Information Retrieval, 8(3):359–381, 2005.

(19)

Music Recommendation Based on Multiple

Contextual Similarity Information

Chih-Ming Chen

∗

_{, Ming-Feng Tsai}

∗

_{, Jen-Yu Liu}

†

_{, Yi-Hsuan Yang}

†

∗_{Department of Computer Science & Program in Digital Content and Technology}

National Chengchi University, Taipei 11605, Taiwan Email: {g10018, mftsai}@cs.nccu.edu.tw

†_{Research Center for Information Technology Innovation}

Academia Sinica, Taipei 11564, Taiwan Email: {ciaua, yang}@citi.sinica.edu.tw

Abstract—This paper proposes a music recommendation ap-proach based on various similarity information via Factorization Machines (FM). We introduce the idea of similarity, which has been widely studied in the filed of information retrieval, and incorporate multiple feature similarities into the FM framework, including content-based and context-based similarities. The sim-ilarity information not only captures the similar patterns from the referred objects, but enhances the convergence speed and accuracy of FM. In addition, in order to avoid the noise within large similarity of features, we also adopt the grouping FM as an extended method to model the problem. In our experiments, a music-recommendation dataset is used to assess the performance of the proposed approach. The datasets is collected from an online blogging website, which includes user listening history, user profiles, social information, and music information. Our experimental results show that, with various types of feature similarities the performance of music recommendation can be enhanced significantly. Furthermore, via the grouping technique, the performance can be improved significantly in terms of Mean Average Precision, compared to the traditional collaborative filtering approach.

I. INTRODUCTION

Similarity is an important concept in recommendation. Given the favorite items of a user, it is sensible to recommend the other items similar to those favorite ones. Similarity between items can be measured in several ways, and different methods in measuring similarity can be complementary to one another in practice. For example, for music recommendation, some users prefer songs similar in melody, while others prefer songs similar in lyrics. The more information we have regarding different aspects of similarity, the more likely we are able to give successful recommendation.

If similarity is measured in terms of the number of people who share the same taste regarding the items, the resulting model can be considered as a collaborative filtering (CF)-based model. On the other hand, if similarity is measured in terms of the affinity of the items in a feature space, the resulting model is usually known as content-based (CB) model. Hybrid models that blend the aforementioned two models have also been studied in the literature. In particular, Factorization Machine (FM) has emerged in recent years as a promising framework for hybrid recommendation. With proper features, FM is able to mimic many state-of-the-art CF/CB-based algorithms.

Under the FM framework, it is possible to exploit every co-occurrence pattern among items to capture more information. Moreover, representing similarity in the form of a matrix can be more informative than representing each item as a feature vector, because the latter requires an additional process to extract similarity information from the feature vectors, an operation which is performed only implicitly by FM.

Music preference is not only affected by personal factors of the listener and the musical factors of the music items; it is also highly dependent on the context of music listening. For example, people listen to different music when being in an office or when exercising; when feeling blue or when being in a happy mood. Therefore, it is important to consider contex-tual information for better recommendation performance. This study also features the use of multiple similarity information computed from the contextual factors of music listening.

From technical point of view, FM models the global bias, feature biases and weights of the interactions among all the features, including vector-based and matrix-based ones. There-fore, it is likely that some noisy information will be mixed in the final prediction model. To remedy this, we propose to adopt a grouping technique to remove unnecessary interactions. In other words, we divide the features into distinct group and only account for interactions among the features between the different groups. In this way, noises inherent from unnecessary interactions can be largely eliminated. Our evaluation shows that such grouping technique is in particular important when one considers matrix-based features as similarity matrices, due to the increase in the number of features (and accordingly the number of potential unnecessary interactions). Although there are multiple ways features can be grouped, our result shows that there are some guidelines in finding a good grouping.

In the experiments, a dataset crawled from a real-world

social blogging website, LiveJournal 1_{, as it contains rich}

contextual information that is entered by users spontaneously in their day-to-day lives [1], [2]. The features are extracted from user profiles and music characteristics such as geographic information and audio information, showing that similarity computation can be easily applied to most kinds of features. Since some interactions between features provide little infor-mation, we generate different grouping methods to examine whether the grouping technique can improve the performance.

(20)

Finally we conduct experiments with different parameters. The experimental results show that similarity information significantly enhance the recommendation performance. Fur-thermore, via grouping factorization machine, the performance can be further improved to 0.52 in terms of Mean Average Precision with p-value less than 0.01.

II. RELATEDWORK

Recommender systems are widely deployed in commercial business, with collaborative filtering (CF) being one of the most popular models . CF models filter out the useless infor-mation and keep similar patterns to predict user behavior. More recently, machine-learning techniques provide a promising way to perform recommendation. In this section, we survey on the related studies from the different perspectives.

A. Contextual Recommender System

Traditional recommendation methods can be divided into two main categories: CF and CB. Many famous commercial recommendation systems are based on these methods, such as the ones used by Youtube or Amazon [3], [4]. However, such methods are limited due to the difficulty of incorporating contextual information, which is gaining increasing importance due to the rapid growth of information on the Internet.

In light of this, many methods have been proposed for the problem of contextual recommendation. For example, Meng et al. [5] investigated the individual preference and the inter-personal influence on online item adoption and recommenda-tion. Yelong et al. [6] proposed a joint personal and social latent factor (PSLF) model that combines the state-of-the-art collaborative filtering and the social network modeling approaches for social recommendation. Kailong et al. [7] employed several interesting features form tweets, including social relation features, content-relevance features, tweets’ content-based features and publisher authority features. From these prior arts, we can observe that most studies develop-ing their model based on various types of features. In the competition of KDDCup 2012, Tianqi et al. [8] combined a variety of models by incorporating different features. Their result indicates the importance of the contextual features. Instead of focusing on the CF method, we propose an approach that integrates the advantages from the CF method, that is, incorporates the similarity information into the factorization model.

B. Music Recommender System

There are also many studies related to music recommen-dation. For example, Negar et al. [9] presented a context-aware music recommendation system that infers user’s short-term music preference based on the most recent sequence of songs liked by the user using sequential data mining. Noam et al. [10] used a hierarchical track-album-artist-genre structure in modeling the biases of music items, and used music sessions to model session bias of users, showing the importance of bias modeling. Cai et al. [11] showed that emotion can be useful for matching songs to documents according to the audio and text content. Unlike these existing works, the contextual information considered in this work is mined from user-generated articles. Moreover, we use FM to study the effect

!

! Ď

"Ď

"

#Ď

$Ď %&' &()* ! +, -*! %&'&()*! +,-*! +, -*! +,-*! ./,&0! ./ ,&0 ! %&'&()*! ./,&0! %&' &()* ! ./ ,&0 ! ġ! ! ! 1*&'&23-!4.! .)5*&6Ď 74! .)5*&6Ď %&' &()* ! +, -*! +, -*! ./ ,&0 ! %&' &()* ! ./ ,&0 ! %&'&()*! +,-*!

+,-*! ./,&0! %&'&()*!./,&0!

%&'&()*&58! .)5*&6Ď %&'&()*&58! .)5*&6Ď %&'&()*&58! .)5*&6Ď %&'&()*&58! .)5*&6Ď %9)*,-!"-05:*!

Fig. 1. Illustration of Our Proposed Approach vs. Factorization Machine of multiple types of features which are extracted from user profile, user-generated articles, geographic information, item characteristic and audio features.

C. Factorization Model

The goal of a recommender system is to predict whether a user would like an object. In recent years, FM models has proven itself to be a competitive and flexible model for a variety of recommendation tasks [5], [8]. For example, Jason et al. [12] studied the joint problem of recommending items to a user with respect to a given query and introduced a factorized model for optimization. Istv´an et al. [13] took an MF-based approach with a simple rating-based predictor on the Netflix Prize Dataset.

It can be found that a common problem among FM-like models is the need to re-design the prediction model task by task. To solve this problem, Rendle described a generic FM framework called libFM [14], which is able to simulate many other successful models via factorization machine by feature engineering (i.e., by using corresponding features). As demonstrated in [14], libFM generalizes existing methods such as standard matrix factorization, Pairwise Interaction Tensor Factorization (PITF) and SVD++. Moreover, a system based on libFM has won the second title in a KDDcup competition [15]. Liangjie et al. [15] modified the original model to handle multiple aspects of the dataset at the same time. In contrast, in this work we aim at incorporating similarity information to libFM without major modification of its framework, thereby reserve the advantages of libFM.

III. METHODOLOGY

Figure 1 illustrates the main concept of incorporating similarity information into the FM framework. In general, a traditional CF-based matrix only keeps the records of user-to-item information, but FM factorizes this form to a

multiplica-tion of two feature vectors (i.e. V and VT _{in Figure 1). Our}

proposed approach further integrates the similarity information with the framework to capture the similar patterns from the referred objects. Below we further describe the Factorization Machines and our proposed approaches.

(21)

A. Standard FM

Factorization Machines can act like most factorization models by feeding various types of features. It learns the weights of all interactions between the features. In general, a two-way factorization machine model can be defined as:

ˆ y(x) = w0+ n � i=1 wixi+ n � l=1 n � j=l+1 ˆ wljxlxj, (1)

where w0 is the global bias, wl is the weight of features

xl, and wlj models the interaction of each pair of features.

The interaction wlj can be factorized into pairs of interaction

parameters, ˆ wlj= κ � f =1 vlfvjf. (2)

The parameter κ determines the model complexity. Rather than only using single parameter for each interaction, this way allows high quality parameters estimated by higher-order interactions under sparsity. Factorization Machine provides a promising framework for recommendation problem. Unlike the generic matrix factorization model, it can be easily used to conduct feature engineering. For more details of FM, please refer to [14].

B. Grouping FM

Factorization Machines provide a good framework for modeling the interactions between features, but sometimes similar type of features may cause confusion while learning, especially with a large number of features. Hence we can utilize the bag-of-feature concept to the standard factorization machine by grouping the features with similar characteristic. Therefore it can deal with the tasks in a more flexible way with different feature partitions. After removing the non-informative weights from the FM models, the original formula can be rewritten as: ˆ y(x) = w0+ n � i wixi+ n � l∈G(l) n � j /∈G(l) xlxj κ � f =1 vl,fvj,f, (3)

where the xl belongs to the group G(l), and the mutual effect

of xl and xj is dropped out while they are in the same group.

By the grouping technique, it eliminates the unnecessary interactions such as the interaction between a user and the user’s age is non-informative. The grouping technique not only speeds up the convergence of optimization but also provides a flexible way to construct different feature combinations. Note that the modified prediction function would be the same as the original one when every feature has its own group.

LibFM provides three major optimization criteria to learn the data: stochastic gradient descent [16] (SGD), alternating least-squares [17] (ALS) and Markov Chain Monte Carlo [18] (MCMC). In our experiments, the MCMC method is chosen because it can automatically learn the data without giving

the external parameters such as learning rate 2 _{and the}

reg-ularization term 3_{. For MCMC, the gradient for the grouping}

2_{The learning rate is a common parameter for controlling the learning steps.} 3_{The regulation term is used to prevent the model from overfitting problem.}

Factorization Machine is derived as follows: hθ(x) = ∂ ˆy(x) ∂θ =    1, if θ = w0 xj, if θ = wj xj�_j�_∈G(j)_/ vj�,fxj, if θ = vj,f (4) C. Similarity Computation

Motivated by the strength and efficiency of CF method, we seek to combines the advantages with the factorization model. Since FM has a good framework for modeling the input features, we can directly extract the similarity information from the users and items. This is similar to CF methods, and can be easily embedded into a feature vector. In general, the utilized features are divided into following three types, and each type has its own computation method.

1) ID Domain: The ID variable is used to identify a

target, and it only belongs to a specific target. For instance, User ID is in the ID domain, which means that each user has his/her own unique ID variable. Technically a similarity measurement is a function that computes the degree of similarity between a pair of targets, e.g. the similarity of listening histories of two users. Given two vectors of attributes, A and B, the similarity score is computed by the extended version of cosine similarity:

similarity = A∩ B

|A|1−α_|B|α, (5)

where α ∈ [0, 1] is a tuning parameter.

2) Categorical Domain: The categorical variable

represents the extracted features from the user and item attributes such as the User Age and Music Genre. The similarity computation is also based on Equation 5.

3) Real Value Domain: If the attribute is already a

number ∈ R, such as Audio Information. The sim-ilarity score is calculated by the Euclidean distance. In general, for an n-dimensional space, the distance between feature vector q and feature vector p is:

d(p, q) = � � � � n � i=1 (pi− qi)2. (6)

For the ID domain, the function O represents the referred objects from target i and target j. For example, given the listening histories of two users the α determines whether the similarity score considers the amount of referred objects from another target or not. Take the following three users with the listening records as an example:

O(U seri) = [1, 2, 3],

O(U serj) = [1, 2, 3],

O(U serk) = [1, 2, 3, 4].

Then Userj is more similar to Useri than Userk based on

the listening history while the α = 1; on the other hand, they will get a same score while the α = 0.

(22)

For the categorical indicators, because this kind of feature usually occurs in different objects, the function O will be the collection of referred objects for a target. Take the User Age as an example, if we want to know the similarity of listening history between 15-year-old users and 30-year-old users, the function O will collect all the songs of the users whose age is between 15 and 30.

For the real-value indicators, the feature vector is

nor-malized by the standard score: x−µ

σ , where µ is the mean

of the population and σ is the standard deviation of the population. The score indicates how many standard deviations an observation is above or below the mean.

Finally suppose we have a set of similarity scores for a specific target and seek to embed them into a feature vector, a simple way is to directly index them with corresponding scores. However, the popular object generally contains more similar objects than the others. It may leads to an unbalance problem that unpopular objects are hard to get the similarity score. In order to take the balance issues into account, we only keep the top-k similar objects as the new score basis, and normalize the new vector of k values to 1:

¯ sij = sij �n j�=1|sij�| . (7)

The purpose of this step is to avoid the unbalance of

simi-larity information. For example, s(Useri) = (0, 0.8, 0.6)and

s(U serj) = (0.1, 0, 0.2), Useriwill have more probability of

getting high scores because of the high values of the similarity vector.

IV. EXPERIMENTALSETUP

This section describes the experimental setup we employed to study the influence of different factors on the performance of music recommendation.

A. Evaluation Metric

We employed two metrics to evaluate the recommenda-tion performance: the truncated mean average precision at k (MAP@k) and recall. For each user, let P (k) denotes the precision at cut-off k:

AP (u, o) =

�k

p=1P (k)× ruo(p)

I(u) , (8)

where o(p) = i describes the item i is ranked at position p in

the order list o, and rui means whether the user u has listened

to song i or not(1 = yes, 0 = no). MAP@k is the mean of the average precision scores for the top-k results:

M AP @k =

�U

u=1AP (u, o)

U , (9)

where U is the total number of target users. Higher MAP@k indicates better recommendation accuracy.

Recall measures how many songs the user really likes are recommended by the automatic system. It is computed by:

Recall = |{Correct Songs}| ∩ |{Returned T op k Songs}|

|{Correct Songs}| .

(10) High recall means that most of songs the user actually likes or listens to are recommended.

TABLE I. THE FEATURE SETS CONSIDERED IN THIS WORK.

abbr. Feature Unique Index Type U User ID 19,596 -S Song ID 30,260 -H Listening History 30,260 -BY Birth Year (of users) 100 Cb LR Live Region (of users) 208 Cb M Mood Tags (of users) 132 Cx VAD VAD values (of articles) 3 Cx A Artists (of songs) 5,175 Cb Au Audio Information 53 Cb SR Social Relation 674,932 Cx Note: P denotes the feature of user profile, Cb denotes the content-based feature that are extracted from songs, and Cx denotes the context-based feature that are extracted from user.

Fig. 2. Livejournal sample posts. B. Dataset

Our experiments are performed on a real-world dataset collected from a well-known social blogging websites – Live-Journal. LiveJournal is unique in that, in addition to the common feature of blogging, each post is accompanied with a “Mood” column and a “Music” column so that users can write down their moods and songs in their minds while posting, as Figure 2 exemplifies. From LiveJournal, we crawled a total number of 1,928,868 listening records covering 674,932 users and 72,913 songs as an initial set. For the purpose of retaining enough number of data in the training and test sets for this study, we only considered users who have more than 10 listening records and discarded the records of the other users. This filtering resulted in the final set of 225,652 listening records (11.7% of the initial set) among 19,596 users and 30,260 songs.

For evaluation, we split the dataset for each user according to the following 80/20 rule: keeping full listening history for the 80% and the half of listening history for the remaining 20% users as the training data, and the other half of the remaining 20% users as the testing data. For each record, we randomly add 10 songs as negative records to construct the testing pool. C. Feature

The structure of collected music dataset is depicted in Figure 3, as these factors affect how people choose the music. Personal factors indicate the characteristics that people would possess for a long period of time, such as age and gender. People with different levels of music background may appre-ciate music differently, which in turn affects music preference. Musical factors consist of the audio content, its profile, and even the artwork of the CD. People may choos a song because its melody or the singer. Situational factors include those that

(23)

!"#$%&"&'( )"#$*+,( (-.#"/01( 20/$*+#( 3"$.04*&01( 20/$*+#( 5#%+(6+*71%( 3*/"01( 8&9*+:04*&( 5#%+(;+4/1%#( !*/04*&( 5#%+(-**<#( -.#"/(6+*71%( ;.<"*( 8&9*+:04*&( -.#"/(=:*4*&( 6%+#*&01( 20/$*+#(

Fig. 3. The structure of LiveJournal dataset

TABLE II. AFFECTIVENORMS FORENGLISHWORDS

Description Valence Arousal Dominance dream 6.73 4.53 5.53 eat 7.47 5.69 5.60 favor 6.46 4.54 5.67 good 7.47 5.43 6.41 hate 2.12 6.95 5.05 Note: 5 example words of ANEW dictionary.

persist for a short period of time such as when and where you listen to music, what you are doing and what your mood is. People often express their feelings through listening to music, and the user-generated article reflects their recent mood.

Table I summarizes the features used in the experiments, which are described in detail below.

1) Content-based Features: Content-based features refer to features that describe either the user or the item. For describing users, we have Birth Year (BY), Live Region (LR) and Social Relations (SR) features. The birth years for the users in our dataset fall in a window of 100 years. Moreover, the users are from 208 regions. We consider users who were born in the same year or users who were from the same region as similar. On the other hand, from LiveJournal we can obtain friendship and construct the social network among the users. This gives rise to the social relation based similarity matrix. People who are friends to one another are likely to share similar music taste.

For describing songs, we have Artist (A) and Audio Infor-mation (Au) features. The artist feature simply indicates the artist (among the 5,175 possible artists) of the songs. If two songs are sung/performed by the same artist, they are likely to be more similar. The audio features consists of 53 perceptual dimensions of music, including danceability, loudness, mode,

and tempo. They are extracted by using the EchoNest API4_{, a}

commonly used audio feature extraction tool developed in the field of music information retrieval [19]. We can measure the similarity between two songs in this 53-dimensional feature space.

4_{http://echonest.com/}

TABLE III. EVALUATION RESULT OFCF-BASED ALGORITHMS

Model MAP@10 Recall Randomize 0.0578 0.1656 User-based CF 0.3668 0.4748 Item-based CF 0.3093 0.5115 SVD++ 0.3506 0.4844 FM 0.3817 0.5216

2) Context-based Features: The user-generated articles are interesting context-based features in the dataset, but it may contains too many redundant words. Motivated by the idea of emotional matching, we convert the original content of an article into a vector of emotional words by referring to the dictionary of Active Norms for English Words (ANEW) [20], which provides a set of normative emotional ratings for English words. We retain the words which can be found in the ANEW dictionary and weight them by the TF-IDF weighting. Specifically, a word is scored by tf(t, d) × idf(t, d), where

tf (t, d) = f (w, d)

max{f(w, d) : w ∈ d}, (11)

idf (t, d) = log |D|

|{d ∈ D : t ∈ d}|, (12)

and D is the set of all articles. A term with higher score indicates that the term has a higher term frequency wight and a lower document frequency of the term in the whole collection of articles. In addition, the ANEW dictionary also provides a set of normative emotional ratings for English words. The emotional words are rated by Valence (or pleasantness; pos-itive/negative active states) , Activation (or arousal; energy and stimulation level) and Dominance (or potency; a sense of control or freedom to act), the fundamental emotion dimen-sions found by psychologists [21]. Finally each word vector of articles is converted to valence, arousal, and dominance (VAD) values. For example, for the sentence ”I had a dream last night, I was eating a marshmallow,” the VAD values would be 14.2, 10.22, and 11.13, respectively, according to Table II. Moreover, we also collected the recent mood tags which are recent used by each user.

V. EXPERIMENTALRESULTS

We conducted a series of experiments with different set-tings. First of all, we attempted to demonstrate the similarity information is effective on most kinds of features under the factorization model. Second, we compared the performance of the standard Factorization Machine with that of the grouping Factorization Machine, and then examined the effects of dif-ferent feature combinations. Finally, we studied the sensitivity of the proposed method to the parameters

A. Similarity Approach

The similarity indicator can be represented as the cate-gorical set domain as used in [17]. For instance, suppose that ”Alice is similar to Charlie and Sandy”, the corresponding sim-ilarity indicator may be the vector z(Bob, Charlie, Sandy) = (0, 0.2, 0.8), where the sum of all values equals to 1 according to Equation 7.

整合社群資訊於學習排序模型之推薦系統

行政院國家科學委員會專題研究計畫 期末報告