行政院國家科學委員會專題研究計畫 期末報告
整合社群資訊於學習排序模型之推薦系統
計 畫 類 別 : 個別型
計 畫 編 號 : NSC 101-2221-E-004-017-
執 行 期 間 : 101 年 08 月 01 日至 102 年 07 月 31 日
執 行 單 位 : 國立政治大學資訊科學系
計 畫 主 持 人 : 蔡銘峰
計畫參與人員: 碩士班研究生-兼任助理人員:陳志明
碩士班研究生-兼任助理人員:陳禔多
碩士班研究生-兼任助理人員:劉澤
報 告 附 件 : 出席國際會議研究心得報告及發表論文
公 開 資 訊 : 本計畫可公開查詢
中 華 民 國 102 年 10 月 31 日
中 文 摘 要 : 近年來由於網站上社群媒體的盛行,導致於網路上的資訊量
迅速增加,因此如何有效率地找尋資訊已經變得十分重要。
而社群推薦系統(Social Recommendation System)在現今的
社群網路中也已經變成一個不可或缺的重要工具。如何有效
率地推薦使用者有興趣的資訊已經成為近年來資訊相關科學
研究中十 分重要的研究議題。推薦系統的研究亦可視為資訊
檢索研究之分支,其主旨在於如何根據一些面向資訊來建立
模型,此模型可以用來預測使用者對於某些物品的喜好或是
等級評定。推薦模型的建立傳統有二種主流的作 法:一、利
用協同過濾技術(Collaborative Filtering Approach),
如:矩陣分解 (Matrix Factorization),來根據相似的使用
者行為過濾資訊 ; 二、利用內容分析技術(Content-based
Approach),如:向量模型(Vector Space Model),來計算推
薦物品間的相似度以進行推薦。
因此,在此計畫中我們計畫利用機器學習技術來進行推薦系
統模型習得的研究。本計畫主要完成項目為二項:一、利用
機器學習方式建立推薦模型,此模型已成功應用在音樂推薦
上,並且結合使用者方面資訊改善推薦之效能,此項成果也
於計畫執行期間發表二篇相關國際論文。二、建置手機應用
程式推薦平台,此平台目前已有 2330 位使用者並且有 1278
個手機應用程式,未來會持續收集使用者資料,並且在此資
料集上進行社群推薦學習演算法之改進與研究。
中文關鍵詞: 資訊檢索、機器學習、自然語言處理、社群網路與分析
英 文 摘 要 : Social networking websites have becoming more and
more popular in recent years. On these websites,
there are hundreds of millions of active users
creating vast information, which has not been
available before. Such vast information, therefore,
poses a great challenge in terms of information
overload. Social Recommendation Systems is mainly to
reduce the information overload over social
networking websites by presenting the most relevant
information to users. Recommendation systems can also
be considered a subclass of information ranking
systems that aim to use a model built from the
characteristics of an item (content-based approaches)
or a user's social information (collaborative
filtering approaches) to predict the 'rating'
or 'preference' that the user would give to the
item. Most of previous work uses matrix
factorization, a typical collaborative filtering
technique, to handle users' social information such
as behavior and activity.
This work attempts to use machine learning techniques
to deal with such a preference problem. Two major
contributions of this project has been accomplished:
First, we have used machine learning techniques to
construct recommendation models, and applied the
models on music recommendation, in which user
information is employed to improve the performance.
Second, we have built up a recommendation platform on
mobile apps. In the platform, we have collected 2330
users with 1278 various mobile apps. In the future,
we will continue to collect more data, and conduct
research on the dataset to enhance our social
recommendation algorithms.
英文關鍵詞: Information Retrieval, Machine Learning, Natural
Language Processing, Social Network and Analysis
1 The Background and Goals of the Research Project
������������
1.1 Background
Social networking websites have becoming more and more popular in recent years. Well-known
examples include photo and video sharing websites like Flickr and YouTube, blog and wiki websites
like Blogger and Wikipedia. In addition, there are also social tagging websites like Declicious, and
social network websites like Facebook, and micro-blogging websites like Twitter. On these websites,
there are hundreds of millions of active users creating vast information, which has not been available
before. Such vast information, therefore, poses a great challenge in terms of information overload.
Social Recommendation Systems is mainly to reduce the information overload over social
net-working websites by presenting the most relevant information to users. In [18], Resnick and Varian
claim that in a typical recommender system, people provide recommendations as inputs, which the
system then aggregates and directs to appropriate recipients. In some cases the primary
transforma-tion is in the aggregatransforma-tion; in others the system’s value lies in its ability to make good matches between
the recommenders and those seeking recommendations. Recommendation systems are systems that
are meant to augment this process in cases where we need help. Examples of such instances include:
gathering many like-minded people in a database, gathering information about thousands of books,
combining information in guides far more than we can research, etc.
For a typical recommender system, there are three steps:
1. The user provides some form of input to the system. These inputs can be both explicit and
implicit [17]. Ratings submitted by users are among explicit inputs whereas the URLs visited
by a user and time spent reading a web site are among possible implicit inputs.
2. These inputs are brought together to form a representation of the user’s likes and dislikes. This
representation could be as simple as a matrix of items-ratings, or as complex as a data structure
combining both content and rating information.
3. The system computes recommendations using these “user profiles.”
Even though the steps are essentially the same for most recommender systems, there have been
differ-ent approaches to both step 2 and 3. Two of the traditional approaches to building a user profile and
computing recommendations are collaborative Filtering and content-based recommendation. Some
researchers also tried hybrid approaches to improve the quality of the recommendations.
In general, recommendation systems produce a list of recommendations in one of the two ways
-through collaborative filtering or content-based approach. Collaborative filtering techniques construct
a model from a user’s past behavior (i.e., items previously purchased or selected) and similar decisions
made by other users, then use that model to recommend items that the user may have an interest in
[19]. Content-based approaches utilizes a set of discrete attributes/features of an item in order to
recommend additional items with similar properties [12, 13]. In addition, these two approaches can
also be combined as a hybrid recommendation system. In the context of content-based approaches,
recommendation systems can also be considered a subclass of information ranking systems that aim
to use a model built from the characteristics of an item or a user’s social information to predict the
“rating” or “preference” that the user would give to the item. From the perspective of predicting
“rating” or “preference”, in recent years many machine learning techniques have been studies for the
problem of Learning to Rank [10, 21]. Below we take a brief survey on these techniques in terms of
their pros and cons.
1.2 Related Work
1.2.1 Collaborative Filtering
Collaborative filtering technique is one of the approaches to the design of recommender systems that
has been widely used. Collaborative filtering approaches are based on collecting and analyzing a large
amount of information on the behaviors, activities or preferences among users, and then predicting
what users will like based on their similarity to other users [5], which is called user-based
collabora-tive filtering. So, user-based collaborating filtering attempts to the social process of asking a friend for
a recommendation. Another of the most famous example of Collaborative Filtering is item-based
col-laborative filtering, an algorithm popularized by the Amazon recommender system [9]. The main idea
behind the item-based collaborative filtering is that similar items may be bought by similar people.
Within these two different kind of collaborative filtering techniques, there are two commonly-used
metrics to measure the similarity between two items/users: one is cosine-based similarity, and the
other one is Pearson based similarity, as shown in the following two equations, respectively:
w(u, v) =
�
i∈I
r
u,ir
v,i��
i∈Ir
2u,i��
i∈Ir
v,i2(1)
w(u, v) =
�
i∈I
(r
u,i− r
u)(r
v,i− r
v,i)
��
i∈I
(r
u,i− r
u)
2��
i∈I
(r
v,i− r
v)
2(2)
where u and v stand for two users, i and j for two items, r
u,ifor the rating given by user u to item i,
and r
ufor the average rating given by user u.
The main advantage of the collaborative filtering approach is that it does not rely on machine
analyzable content and therefore it is capable of accurately recommending complex items such as
movies without requiring an “understanding” of the item itself. However, due to the advantage, the
approach often suffer from the following three problems: cold start, scalability, and sparsity [2, 8].
• Cold Start: The systems often require a large amount of existing data on a user in order to make
accurate recommendations.
• Scalability: In many of the social recommendation environments, there are usually millions of
users and products. Thus, a large amount of computation power is often necessary to calculate
recommendations.
• Sparsity: The number of recommended items on major sites is extremely large. The most active
users will only have rated a small subset of the overall database. Thus, even the most popular
items have very few ratings.
In [16, 20], a particular type of collaborative filtering algorithm uses matrix factorization, a
low-rank matrix approximation technique, to predict recommendations. Most of the collaborative filtering
systems apply the nearest neighbor model for computing recommendations; the systems that use the
nearest neighbor model rely upon the assumption that people who agreed in the past are likely to
agree in the future as well [17].
1.2.2 Content-based Approach
based approach is another typical approach when designing recommender systems.
Content-based filtering methods are Content-based on information and characteristics of the items that are going to be
recommended. These algorithms try to recommend items that are similar to those that a user liked in
the past. In particular, various candidate items are compared with items previously rated by the user
and the best-matching items are recommended. This approach has its roots in information retrieval
and information ranking research.
Content-based systems recommend items based on items’ content rather than other users’ ratings.
There are essentially four steps for content-based recommendations:
1. The first step is to gather content data about the items. Most systems use Information Extraction
techniques to extract these data, and Information Retrieval techniques to retrieve the relevant
information [3, 12]. Web crawlers collecting data off the web are common tools in this step.
2. The second step is to ask the user to provide some ratings. In this step, the user might be asked
to rate random items, or can search and find any books that the user likes.
3. The third step is to compile a profile of the user using the content information extracted in
the first step and the rating information provided in the second step. Different information
retrieval or machine learning algorithms can be used to learn a profile.
Term-frequency/inverse-document frequency weighting [11] and the Bayesian learning algorithm [14] are some of the
many techniques that have been tried.
4. The last step is to match unrated books’ contents with the user profile compiled in the third step
and assigning scores to the items depending on the quality of the match. The items are ranked
according to their scores and presented to the user in order [12].
Basically, these methods use an item profile (i.e., a set of discrete attributes and features)
charac-terizing the item within the system. This part may also need some content analysis techniques such
as natural language processing. The system then creates a content-based profile of users based on a
weighted vector of item features. The weights denote the importance of each feature to the user and
can be computed from individually-rated content vectors using a variety of machine-learning
tech-niques. Simple approaches use the average values of the rated item vector while other sophisticated
methods use machine learning techniques such as Bayesian Classifiers, cluster analysis, decision
trees, and artificial neural networks in order to estimate the probability that the user is going to like
the item. However, content-based systems also suffer from the following three problems:
• For some domains, either there is no content information available, or the content is hard to
analyze.
• Formulating taste and quality is not an easy task.
• These systems can suggest only items whose content match with the user’s profile. If the user
has tastes that are not represented in user’s profile, items talking to the unrepresented taste will
not be recommended.
Seeing that one disadvantage of a system is not an issue for another, some research has done
on combining collaborative filtering with content-based recommendation. Different techniques were
employed to combine the two, called “hybrid approaches.”
1.2.3 Hybrid Approaches
Recent studies show that hybrid methods consisting of collaborative filtering and content-based
ap-proach can be more effective in some cases. Hybrid apap-proach can be achieved in various ways
such as combining content-based and collaborative-based predictions. Some empirical studies show
that a hybrid approach can provide more accurate than the pure approach based on content-based or
collaborative-based system. These methods can also be alleviate some problems, such as cold start
and sparseness problem.
2 Methodology
����
2.1 Standard FM
Factorization Machines can act like most factorization models by feeding various types of features.
It learns the weights of all interactions between the features. In general, a two-way factorization
machine model can be defined as:
ˆ
y(x) = w
0+
n�
i=1w
ix
i+
n�
l=1 n�
j=l+1ˆ
w
ljx
lx
j,
(3)
where w
0is the global bias, w
lis the weight of features x
l, and w
ljmodels the interaction of each pair
of features. The interaction w
ljcan be factorized into pairs of interaction parameters,
ˆ
w
lj=
κ�
f =1v
lfv
jf.
(4)
The parameter κ determines the model complexity. Rather than only using single parameter for each
interaction, this way allows high quality parameters estimated by higher-order interactions under
spar-sity. Factorization Machine provides a promising framework for recommendation problem. Unlike
the generic matrix factorization model, it can be easily used to conduct feature engineering. For more
details of FM, please refer to [1].
2.2 Similarity Computation
Motivated by the strength and efficiency of CF method, we seek to combines the advantages with the
factorization model. Since FM has a good framework for modeling the input features, we can directly
extract the similarity information from the users and items. This is similar to CF methods, and can
be easily embedded into a feature vector. In general, the utilized features are divided into following
three types, and each type has its own computation method.
1.
ID Domain: The ID variable is used to identify a target, and it only belongs to a specific target.
For instance, User ID is in the ID domain, which means that each user has his/her own unique
ID variable. Technically a similarity measurement is a function that computes the degree of
similarity between a pair of targets, e.g. the similarity of listening histories of two users. Given
two vectors of attributes, A and B, the similarity score is computed by the extended version of
cosine similarity:
similarity =
A
∩ B
|A|
1−α|B|
α,
(5)
where α ∈ [0, 1] is a tuning parameter.
2.
Categorical Domain: The categorical variable represents the extracted features from the user
and item attributes such as the User Age and Music Genre. The similarity computation is also
based on Equation 5.
3.
Real Value Domain: If the attribute is already a number ∈ R, such as Audio Information. The
similarity score is calculated by the Euclidean distance. In general, for an n-dimensional space,
the distance between feature vector q and feature vector p is:
d(p, q) =
�
�
�
�
�
n i=1(p
i− q
i)
2.
(6)
For the ID domain, the function O represents the referred objects from target i and target j. For
example, given the listening histories of two users the α determines whether the similarity score
considers the amount of referred objects from another target or not. Take the following three users
with the listening records as an example:
O(U ser
i) = [1, 2, 3],
O(U ser
j) = [1, 2, 3],
O(U ser
k) = [1, 2, 3, 4].
Then User
jis more similar to User
ithan User
kbased on the listening history while the α = 1; on
the other hand, they will get a same score while the α = 0.
For the categorical indicators, because this kind of feature usually occurs in different objects, the
function O will be the collection of referred objects for a target. Take the User Age as an example, if
we want to know the similarity of listening history between 15-year-old users and 30-year-old users,
the function O will collect all the songs of the users whose age is between 15 and 30.
For the real-value indicators, the feature vector is normalized by the standard score:
x−µσ
, where
µ
is the mean of the population and σ is the standard deviation of the population. The score indicates
how many standard deviations an observation is above or below the mean.
Finally suppose we have a set of similarity scores for a specific target and seek to embed them
into a feature vector, a simple way is to directly index them with corresponding scores. However, the
popular object generally contains more similar objects than the others. It may leads to an unbalance
problem that unpopular objects are hard to get the similarity score. In order to take the balance issues
into account, we only keep the top-k similar objects as the new score basis, and normalize the new
vector of k values to 1:
¯
s
ij=
s
ij�
n j�=1|s
ij�|
.
(7)
The purpose of this step is to avoid the unbalance of similarity information. For example, s(User
i) =
(0, 0.8, 0.6)
and s(User
j) = (0.1, 0, 0.2), User
iwill have more probability of getting high scores
because of the high values of the similarity vector.
3 Experimental Setup
This section describes the experimental setup we employed to study the influence of different factors
on the performance of music recommendation.
3.1 Evaluation Metric
We employed two metrics to evaluate the recommendation performance: the truncated mean average
precision at k (MAP@k) and recall. For each user, let P (k) denotes the precision at cut-off k:
AP (u, o) =
�
kp=1
P (k)
× r
uo(p)I(u)
,
(8)
Table 1: The feature sets considered in this work.
abbr. Feature
Unique Index Type
U
User ID
19,596
-S
Song ID
30,260
-H
Listening History
30,260
-BY
Birth Year (of users)
100 Cb
LR
Live Region (of users)
208 Cb
M
Mood Tags (of users)
132 Cx
VAD VAD values (of articles)
3 Cx
A
Artists (of songs)
5,175 Cb
Au
Audio Information
53 Cb
SR
Social Relation
674,932 Cx
Note: P denotes the feature of
user profile, Cb denotes the
content-based feature that are extracted
from songs, and Cx denotes the
context-based feature that are
ex-tracted from user.
Figure 1: Livejournal sample posts.
where o(p) = i describes the item i is ranked at position p in the order list o, and r
uimeans whether
the user u has listened to song i or not(1 = yes, 0 = no). MAP@k is the mean of the average
precision scores for the top-k results:
M AP @k =
�
Uu=1
AP (u, o)
U
,
(9)
where U is the total number of target users. Higher MAP@k indicates better recommendation
accu-racy.
Recall measures how many songs the user really likes are recommended by the automatic system.
It is computed by:
Recall =
|{Correct Songs}| ∩ |{Returned T op k Songs}|
|{Correct Songs}|
.
(10)
High recall means that most of songs the user actually likes or listens to are recommended.
3.2 Dataset
Our experiments are performed on a real-world dataset collected from a well-known social blogging
websites – LiveJournal. LiveJournal is unique in that, in addition to the common feature of blogging,
each post is accompanied with a “Mood” column and a “Music” column so that users can write down
!"#$%&"&'( )"#$*+,( (-.#"/01( 20/$*+#( 3"$.04*&01( 20/$*+#( 5#%+(6+*71%( 3*/"01( 8&9*+:04*&( 5#%+(;+4/1%#( !*/04*&( 5#%+(-**<#( -.#"/(6+*71%( ;.<"*( 8&9*+:04*&( -.#"/(=:*4*&( 6%+#*&01( 20/$*+#(
Figure 2: The structure of LiveJournal dataset
their moods and songs in their minds while posting, as Figure 1 exemplifies. From LiveJournal, we
crawled a total number of 1,928,868 listening records covering 674,932 users and 72,913 songs as an
initial set. For the purpose of retaining enough number of data in the training and test sets for this
study, we only considered users who have more than 10 listening records and discarded the records
of the other users. This filtering resulted in the final set of 225,652 listening records (11.7% of the
initial set) among 19,596 users and 30,260 songs.
For evaluation, we split the dataset for each user according to the following 80/20 rule: keeping
full listening history for the 80% and the half of listening history for the remaining 20% users as the
training data, and the other half of the remaining 20% users as the testing data. For each record, we
randomly add 10 songs as negative records to construct the testing pool.
3.3 Feature
The structure of collected music dataset is depicted in Figure 2, as these factors affect how people
choose the music. Personal factors indicate the characteristics that people would possess for a long
period of time, such as age and gender. People with different levels of music background may
appre-ciate music differently, which in turn affects music preference. Musical factors consist of the audio
content, its profile, and even the artwork of the CD. People may choos a song because its melody or
the singer. Situational factors include those that persist for a short period of time such as when and
where you listen to music, what you are doing and what your mood is. People often express their
feelings through listening to music, and the user-generated article reflects their recent mood.
Table 1 summarizes the features used in the experiments, which are described in detail below.
3.3.1 Content-based Features
Content-based features refer to features that describe either the user or the item. For describing users,
we have Birth Year (BY), Live Region (LR) and Social Relations (SR) features. The birth years for
the users in our dataset fall in a window of 100 years. Moreover, the users are from 208 regions. We
consider users who were born in the same year or users who were from the same region as similar. On
the other hand, from LiveJournal we can obtain friendship and construct the social network among
the users. This gives rise to the social relation based similarity matrix. People who are friends to one
another are likely to share similar music taste.
For describing songs, we have Artist (A) and Audio Information (Au) features. The artist
fea-ture simply indicates the artist (among the 5,175 possible artists) of the songs. If two songs are
Table 2: Affective Norms for English Words
Description Valence Arousal Dominance
dream
6.73
4.53
5.53
eat
7.47
5.69
5.60
favor
6.46
4.54
5.67
good
7.47
5.43
6.41
hate
2.12
6.95
5.05
Note: 5 example words of
ANEW dictionary.
sung/performed by the same artist, they are likely to be more similar. The audio features consists
of 53 perceptual dimensions of music, including danceability, loudness, mode, and tempo. They are
extracted by using the EchoNest API
1, a commonly used audio feature extraction tool developed in
the field of music information retrieval [6]. We can measure the similarity between two songs in this
53-dimensional feature space.
3.3.2 Context-based Features
The user-generated articles are interesting context-based features in the dataset, but it may contains
too many redundant words. Motivated by the idea of emotional matching, we convert the original
content of an article into a vector of emotional words by referring to the dictionary of Active Norms
for English Words (ANEW) [4], which provides a set of normative emotional ratings for English
words. We retain the words which can be found in the ANEW dictionary and weight them by the
TF-IDF weighting. Specifically, a word is scored by tf(t, d) × idf(t, d), where
tf (t, d) =
f (w, d)
max
{f(w, d) : w ∈ d}
,
(11)
idf (t, d) = log
|D|
|{d ∈ D : t ∈ d}|
,
(12)
and D is the set of all articles. A term with higher score indicates that the term has a higher term
frequency wight and a lower document frequency of the term in the whole collection of articles.
In addition, the ANEW dictionary also provides a set of normative emotional ratings for English
words. The emotional words are rated by Valence (or pleasantness; positive/negative active states) ,
Activation (or arousal; energy and stimulation level) and Dominance (or potency; a sense of control
or freedom to act), the fundamental emotion dimensions found by psychologists [7]. Finally each
word vector of articles is converted to valence, arousal, and dominance (VAD) values. For example,
for the sentence ”I had a dream last night, I was eating a marshmallow,” the VAD values would be
14.2, 10.22, and 11.13, respectively, according to Table 2. Moreover, we also collected the recent
mood tags which are recent used by each user.
We conducted a series of experiments with different settings. First of all, we attempted to
demon-strate the similarity information is effective on most kinds of features under the factorization model.
Second, we compared the performance of the standard Factorization Machine with that of the
group-ing Factorization Machine, and then examined the effects of different feature combinations. Finally,
we studied the sensitivity of the proposed method to the parameters
1http://echonest.com/
Table 3: Evaluation result of CF-based algorithms
Model
MAP@10 Recall
Randomize
0.0578
0.1656
User-based CF 0.3668
0.4748
Item-based CF 0.3093
0.5115
SVD++
0.3506
0.4844
FM
0.3817
0.5216
Table 4: Performance of ID Similarity
LiveJournal Dataset
Features
MAP@10 Recall
U + S
0.3816
0.5217
U + S + H
0.4409
0.5821
U + S + US
0.4310
0.5712
U + S + H + US
0.4427
0.5810
U + S + SS
0.4635
0.6194
U + S + H + SS
0.4897
0.6413
U + S + US + SS
0.4712
0.6251
U + S + US + SS + H 0.5021
0.6491
Note: For the feature abbreviation,
please refer to Table 1.
3.4 Similarity Approach
The similarity indicator can be represented as the categorical set domain as used in [15]. For instance,
suppose that ”Alice is similar to Charlie and Sandy”, the corresponding similarity indicator may be
the vector z(Bob, Charlie, Sandy) = (0, 0.2, 0.8), where the sum of all values equals to 1 according
to Equation 7.
3.4.1 CF-based Recommendation
In the first step, we evaluated the performance on some well-known CF-based Recommendation
al-gorithms to verify the strength of factorization machine. We compare FM with user-based CF,
item-based CF, and a SVD-item-based approach using only the user-item matrix, which is a standard input to
recommendation models. Note that context information or similarity information is not exploited in
this comparison. Table 3 lists the result of these methods. As the table shows, the resulting MAP
of all the CF-based approaches fall within 0.30–0.38. Among the four methods, FM performs the
best. The performance difference between FM and other methods is significant under the t-test. This
validates the effectiveness of FM. Therefore, we employed FM in the subsequent experiments.
Under the CF-based framework, there are two ID indicators: User ID and Song ID. Therefore, we
can obtain the following similarity information according to Equation 5:
• User Similarity (US): Two users are similar if they listen to the same songs.
• Song Similarity (SS): Two songs are similar if they are listened by the same users.
Both of them are directly mined from the listening history. Therefore, they are always available for a
standard recommendation problem. US is applied to users, whereas the SS is applied to items.
Table 5: Performance of Feature Similarity
Features
MAP@10 Recall
U + S + BY
0.4301
0.5751
U + S + BY + BYS 0.4348
0.5830
U + S + A
0.5025
0.6538
U + S + A + AS
0.5125
0.6640
U + S + LR
0.4283
0.5723
U + S + LR + LRS
0.4382
0.5834
U + S + Au
0.4254
0.5809
U + S + Au + AuS
0.4576
0.6114
We evaluated the performance on every possible feature combination. As shown in Table 4, both
the user similarity and the song similarity (U+S+US or U+S+SS) lead to significantly better result,
comparing to the baseline U+S.
We have also implemented KNN-based FM of [1] by adding the listening history to libFM, as
shown in from the second row of Table 4 (i.e., U+S+H). It can be seen that the incorporation of
listening history (’H’) generally improves the result as well. Note that the SS feature is the top-k
most similar music which is not extracted from listening history. If we compare H, US, and SS, SS
achieves the highest MAP@10 (0.4635), showing that the similarity approach is more effective than
the KNN approach is. Moreover, KNN approach may fail when the amount of listening histories is
limited or overwhelmed, but it is easy to determine the number of most similar features used in the
whole data.
By combining all the available information from the listening records (U+S+US+SS+H), we
ob-tained the best result 0.5021 in MAP@10 in Table 4, which is significantly better than the baseline
0.3816. A simple idea as it is, using the proposed ID similarity indicators greatly improve the
accu-racy of recommendation. Moreover, the ID similarity indicators are suitable for other
recommenda-tion problems because they are in the same problem structure: to predict whether an item would be
accepted by a user.
3.4.2 Content-based Recommendation
Four similarity features were extracted from the dataset:
• Birth Year Similarity (BYS): Two users are similar if they are born in the same year.
• Live Region Similarity (LRS): Two users are similar if they live in the same region
geograph-ically.
• Artist Similarity (AS): Two songs are similar if they are sung by the same artist.
• Audio Similarity (AuS): Two songs are similar if they are close in the audio feature space
spanned by the 53 audio features considered in this work.
Note that BYS and LRS are personal information that is not always available for a recommendation
problem. Similarly, AS and Aus are musical information that is only available if we have access to
the metadata or the audio content of the songs.
Table 5 lists the improvement introduced by the use of feature similarity. The results show that
four similarities perform well in recommendations. Among the four similarities, Birth Year
Similar-ity cannot obtain a significant improvement in the experiments. This is possibly due to the
incom-pleteness of the metadata, because only half of the users have birth year information in our dataset.
Table 6: Performance of Feature Similarity
Features
MAP@10 Recall
U + S + M
0.4134
0.5539
U + S + M + MS
0.4202
0.5652
U + S + VAD
0.4483
0.5905
U + S + VAD + VADS 0.4511
0.5935
U + S + SRS
0.4213
0.5653
Table 7: Performance on complete feature vector.
Features
MAP@10 Recall Note
U + S
0.3817
0.5216 Base-line
U + S + C*
0.5120
0.6614
U + S + C* + S* 0.5236
0.6684 Hybrid
Note: C* denotes all the
categori-cal features, and S* denotes all the
extracted similarity features.
Moreover, another interesting observation is that the audio features significantly enhance on the
rec-ommendation performance after the audio similarity is added. The result implies that the abstract
information such as the audio feature is hard to be organized directly, but its similarity information
provides insightful information.
3.4.3 Context-based Recommendations
Next, we evaluated context-based recommendation by using Mood Tag and Emotional Words. These
two features reflect the user’s mood when writing the article. We want to utilize the emotional
infor-mation from user-generated articles and mood tags. The similarity inforinfor-mation can be obtained in the
same way:
• Mood Similarity (MS): Two user are similarly if they tend to express similar moods in their
articles.
• VAD Similarity (VADS): Two users are similar if the affective qualities of the articles they
wrote are similar.
Note that contextual information is also not always available for a recommendation problem. We only
considered context information extracted from mood tags and articles in this work, but the proposed
method is also applicable for other contextual information as well.
As the first and third rows of Table 6 shows, the performance of adding the Mood Tags feature is
0.4134 in terms of MAP@10, which is lower than the contextual VAD feature computed from
user-generated articles. This result indicates that the VAD feature provides more affective information of
the user context. Although the mood similarity does not lead to remarkable improvement, the VAD
similarity feature is still considered effective.
3.4.4 Hybrid Recommendation
Finally, we studied if we can further boost the accuracy by integrating all the proposed similarity
features, including categorical ones (denoted as C* collectively) and similarity features (denoted as
S* collectively). As Table 7 shows, using more data generally leads to better accuracy. When all
Table 8: Statistics of the collected data.
# (Apps)
1,278
# (effective user)
2,330
# (commenting users) 20
Figure 3: Platform interface: front page
the features are considered (U+S+C*+S*), we are able to obtain 0.5251 in MAP@10 and 0.6708 in
recall, both of which are the highest ones in our evaluation. This result confirms again the ability of
the proposed method in incorporating multiple similarity information.
4 Extended Work:
Mobile Apps Recommendation Platform
Besides the above music recommendation work, this project also builds up a Mobile Apps
recom-mendation platform. This platform is still in a preliminary stage, the main goal of which is to collect
the data from users. Table 8 lists some statistics of the collected data. In the table, the effective user
denotes the user whose facebook id does exist and the commenting user denotes that the user among
the 2330 effective user has commented on some Apps. Figures 3-5 display part of the interfaces of
our platform.
5 Conclusions and Future Work
This project proposes a novel approach that incorporates multiple feature similarity to factorization
model via feature engineering. The similarity computation captures the similar patterns from the
objects and enhances the convergence speed and accuracy of FM. The proposed method is applicable
to many kinds of features, which means we can obtain the higher level information from multiple
aspects. The experimental results show that feature similarity indeed benefits the recommendation
Figure 4: Platform interface: user page
Figure 5: Platform interface: user comments
performance. In addition, we also propose several features, including CF-based, content-based and
context-based ones. Among these features, we try to capture the relationship between users and
songs by matching users’ emotions. The results show that the idea is able to enhance the quality of
recommendations.
In the aspect of the extended work of this project, we plan to utilize machine learning techniques
to integrate the social relations between users to build up the recommendation algorithms on the
collected data from the Mobile Apps recommendation platform in the future.
References
[1] Factorization machines with libfm. ACM Trans. Intell. Syst. Technol., 3(3):57:1–57:22, May
2012.
[2] G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: a
sur-vey of the state-of-the-art and possible extensions. Knowledge and Data Engineering, IEEE
Transactions on, 17(6):734 – 749, june 2005.
[3] Marko Balabanovi´c and Yoav Shoham. Fab: content-based, collaborative recommendation.
Commun. ACM, 40:66–72, March 1997.
[4] M. Bradley and P. J. Lang. Affective norms for english words ANEW: Instruction manual and
affective ratings. Technical report, The Center for Research in Psychophysiology, Univ. Florida,
1999.
[5] R. Burke. Hybrid recommender systems: Survey and experiments. User modeling and
user-adapted interaction, 12(4):331–370, 2002.
[6] Douglas Turnbull Derek Tingle, Youngmoo E. Kim. Exploring automatic music annotation with
acoustically-objective tags. pages 55–61, 2010.
[7] Douglas Turnbull Derek Tingle, Youngmoo E. Kim. Exploring automatic music annotation with
acoustically-objective tags. pages 55–61, 2010.
[8] S. Lee, J. Yang, and S.Y. Park. Discovery of hidden similarity on collaborative filtering to
overcome sparsity problem. In Discovery Science, pages 396–402. Springer, 2004.
[9] G.D. Linden, J.A. Jacobi, and E.A. Benson. Collaborative recommendations using item-to-item
similarity mappings, July 24 2001. US Patent 6,266,649.
[10] Tie-Yan Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3:225–331,
March 2009.
[11] P. Melville, R.J. Mooney, and R. Nagarajan. Content-boosted collaborative filtering for
im-proved recommendations. In Proceedings of the National Conference on Artificial Intelligence,
pages 187–192. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999,
2002.
[12] Raymond J. Mooney and Loriene Roy. Content-based book recommending using learning for
text categorization. In Proceedings of the fifth ACM conference on Digital libraries, DL ’00,
pages 195–204, New York, NY, USA, 2000. ACM.
[13] M. Pazzani and D. Billsus. Content-based recommendation systems. The adaptive web, pages
325–341, 2007.
[14] M.J. Pazzani, J. Muramatsu, D. Billsus, et al. Syskill & webert: Identifying interesting web
sites. In Proceedings of the national conference on artificial intelligence, pages 54–61, 1996.
[15] Steffen Rendle, Zeno Gantner, et al. Fast context-aware recommendations with factorization
machines. In Proc. ACM SIGIR, pages 635–644, 2011.
[16] Jasson D. M. Rennie and Nathan Srebro. Fast maximum margin matrix factorization for
collab-orative prediction. In Proceedings of the 22nd international conference on Machine learning,
ICML ’05, pages 713–719, New York, NY, USA, 2005. ACM.
[17] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl. Grouplens:
an open architecture for collaborative filtering of netnews. In Proceedings of the 1994 ACM
conference on Computer supported cooperative work, CSCW ’94, pages 175–186, New York,
NY, USA, 1994. ACM.
[18] Paul Resnick and Hal R. Varian. Recommender systems. Commun. ACM, 40:56–58, March
1997.
[19] C. Sammut and G.I. Webb. Recommender Systems, Encyclopedia of machine learning.
Springer-Verlag New York Inc, 2011.
[20] G´abor Tak´acs, Istv´an Pil´aszy, Botty´an N´emeth, and Domonkos Tikk. Scalable collaborative
filtering approaches for large recommender systems. J. Mach. Learn. Res., 10:623–656, June
2009.
[21] A. Trotman. Learning to rank. Information Retrieval, 8(3):359–381, 2005.
Music Recommendation Based on Multiple
Contextual Similarity Information
Chih-Ming Chen
∗, Ming-Feng Tsai
∗, Jen-Yu Liu
†, Yi-Hsuan Yang
†∗Department of Computer Science & Program in Digital Content and Technology
National Chengchi University, Taipei 11605, Taiwan Email: {g10018, mftsai}@cs.nccu.edu.tw
†Research Center for Information Technology Innovation
Academia Sinica, Taipei 11564, Taiwan Email: {ciaua, yang}@citi.sinica.edu.tw
Abstract—This paper proposes a music recommendation ap-proach based on various similarity information via Factorization Machines (FM). We introduce the idea of similarity, which has been widely studied in the filed of information retrieval, and incorporate multiple feature similarities into the FM framework, including content-based and context-based similarities. The sim-ilarity information not only captures the similar patterns from the referred objects, but enhances the convergence speed and accuracy of FM. In addition, in order to avoid the noise within large similarity of features, we also adopt the grouping FM as an extended method to model the problem. In our experiments, a music-recommendation dataset is used to assess the performance of the proposed approach. The datasets is collected from an online blogging website, which includes user listening history, user profiles, social information, and music information. Our experimental results show that, with various types of feature similarities the performance of music recommendation can be enhanced significantly. Furthermore, via the grouping technique, the performance can be improved significantly in terms of Mean Average Precision, compared to the traditional collaborative filtering approach.
I. INTRODUCTION
Similarity is an important concept in recommendation. Given the favorite items of a user, it is sensible to recommend the other items similar to those favorite ones. Similarity between items can be measured in several ways, and different methods in measuring similarity can be complementary to one another in practice. For example, for music recommendation, some users prefer songs similar in melody, while others prefer songs similar in lyrics. The more information we have regarding different aspects of similarity, the more likely we are able to give successful recommendation.
If similarity is measured in terms of the number of people who share the same taste regarding the items, the resulting model can be considered as a collaborative filtering (CF)-based model. On the other hand, if similarity is measured in terms of the affinity of the items in a feature space, the resulting model is usually known as content-based (CB) model. Hybrid models that blend the aforementioned two models have also been studied in the literature. In particular, Factorization Machine (FM) has emerged in recent years as a promising framework for hybrid recommendation. With proper features, FM is able to mimic many state-of-the-art CF/CB-based algorithms.
Under the FM framework, it is possible to exploit every co-occurrence pattern among items to capture more information. Moreover, representing similarity in the form of a matrix can be more informative than representing each item as a feature vector, because the latter requires an additional process to extract similarity information from the feature vectors, an operation which is performed only implicitly by FM.
Music preference is not only affected by personal factors of the listener and the musical factors of the music items; it is also highly dependent on the context of music listening. For example, people listen to different music when being in an office or when exercising; when feeling blue or when being in a happy mood. Therefore, it is important to consider contex-tual information for better recommendation performance. This study also features the use of multiple similarity information computed from the contextual factors of music listening.
From technical point of view, FM models the global bias, feature biases and weights of the interactions among all the features, including vector-based and matrix-based ones. There-fore, it is likely that some noisy information will be mixed in the final prediction model. To remedy this, we propose to adopt a grouping technique to remove unnecessary interactions. In other words, we divide the features into distinct group and only account for interactions among the features between the different groups. In this way, noises inherent from unnecessary interactions can be largely eliminated. Our evaluation shows that such grouping technique is in particular important when one considers matrix-based features as similarity matrices, due to the increase in the number of features (and accordingly the number of potential unnecessary interactions). Although there are multiple ways features can be grouped, our result shows that there are some guidelines in finding a good grouping.
In the experiments, a dataset crawled from a real-world
social blogging website, LiveJournal 1, as it contains rich
contextual information that is entered by users spontaneously in their day-to-day lives [1], [2]. The features are extracted from user profiles and music characteristics such as geographic information and audio information, showing that similarity computation can be easily applied to most kinds of features. Since some interactions between features provide little infor-mation, we generate different grouping methods to examine whether the grouping technique can improve the performance.
Finally we conduct experiments with different parameters. The experimental results show that similarity information significantly enhance the recommendation performance. Fur-thermore, via grouping factorization machine, the performance can be further improved to 0.52 in terms of Mean Average Precision with p-value less than 0.01.
II. RELATEDWORK
Recommender systems are widely deployed in commercial business, with collaborative filtering (CF) being one of the most popular models . CF models filter out the useless infor-mation and keep similar patterns to predict user behavior. More recently, machine-learning techniques provide a promising way to perform recommendation. In this section, we survey on the related studies from the different perspectives.
A. Contextual Recommender System
Traditional recommendation methods can be divided into two main categories: CF and CB. Many famous commercial recommendation systems are based on these methods, such as the ones used by Youtube or Amazon [3], [4]. However, such methods are limited due to the difficulty of incorporating contextual information, which is gaining increasing importance due to the rapid growth of information on the Internet.
In light of this, many methods have been proposed for the problem of contextual recommendation. For example, Meng et al. [5] investigated the individual preference and the inter-personal influence on online item adoption and recommenda-tion. Yelong et al. [6] proposed a joint personal and social latent factor (PSLF) model that combines the state-of-the-art collaborative filtering and the social network modeling approaches for social recommendation. Kailong et al. [7] employed several interesting features form tweets, including social relation features, content-relevance features, tweets’ content-based features and publisher authority features. From these prior arts, we can observe that most studies develop-ing their model based on various types of features. In the competition of KDDCup 2012, Tianqi et al. [8] combined a variety of models by incorporating different features. Their result indicates the importance of the contextual features. Instead of focusing on the CF method, we propose an approach that integrates the advantages from the CF method, that is, incorporates the similarity information into the factorization model.
B. Music Recommender System
There are also many studies related to music recommen-dation. For example, Negar et al. [9] presented a context-aware music recommendation system that infers user’s short-term music preference based on the most recent sequence of songs liked by the user using sequential data mining. Noam et al. [10] used a hierarchical track-album-artist-genre structure in modeling the biases of music items, and used music sessions to model session bias of users, showing the importance of bias modeling. Cai et al. [11] showed that emotion can be useful for matching songs to documents according to the audio and text content. Unlike these existing works, the contextual information considered in this work is mined from user-generated articles. Moreover, we use FM to study the effect
!
!
! Ď"Ď
"
#Ď
$Ď %&' &()* ! +, -*! %&'&()*! +,-*! +, -*! +,-*! ./,&0! ./ ,&0 ! %&'&()*! ./,&0! %&' &()* ! ./ ,&0 ! ġ! ! ! 1*&'&23-!4.! .)5*&6Ď 74! .)5*&6Ď %&' &()* ! +, -*! +, -*! ./ ,&0 ! %&' &()* ! ./ ,&0 ! %&'&()*! +,-*!+,-*! ./,&0! %&'&()*!./,&0!
%&'&()*&58! .)5*&6Ď %&'&()*&58! .)5*&6Ď %&'&()*&58! .)5*&6Ď %&'&()*&58! .)5*&6Ď %9)*,-!"-05:*!
Fig. 1. Illustration of Our Proposed Approach vs. Factorization Machine of multiple types of features which are extracted from user profile, user-generated articles, geographic information, item characteristic and audio features.
C. Factorization Model
The goal of a recommender system is to predict whether a user would like an object. In recent years, FM models has proven itself to be a competitive and flexible model for a variety of recommendation tasks [5], [8]. For example, Jason et al. [12] studied the joint problem of recommending items to a user with respect to a given query and introduced a factorized model for optimization. Istv´an et al. [13] took an MF-based approach with a simple rating-based predictor on the Netflix Prize Dataset.
It can be found that a common problem among FM-like models is the need to re-design the prediction model task by task. To solve this problem, Rendle described a generic FM framework called libFM [14], which is able to simulate many other successful models via factorization machine by feature engineering (i.e., by using corresponding features). As demonstrated in [14], libFM generalizes existing methods such as standard matrix factorization, Pairwise Interaction Tensor Factorization (PITF) and SVD++. Moreover, a system based on libFM has won the second title in a KDDcup competition [15]. Liangjie et al. [15] modified the original model to handle multiple aspects of the dataset at the same time. In contrast, in this work we aim at incorporating similarity information to libFM without major modification of its framework, thereby reserve the advantages of libFM.
III. METHODOLOGY
Figure 1 illustrates the main concept of incorporating similarity information into the FM framework. In general, a traditional CF-based matrix only keeps the records of user-to-item information, but FM factorizes this form to a
multiplica-tion of two feature vectors (i.e. V and VT in Figure 1). Our
proposed approach further integrates the similarity information with the framework to capture the similar patterns from the referred objects. Below we further describe the Factorization Machines and our proposed approaches.
A. Standard FM
Factorization Machines can act like most factorization models by feeding various types of features. It learns the weights of all interactions between the features. In general, a two-way factorization machine model can be defined as:
ˆ y(x) = w0+ n � i=1 wixi+ n � l=1 n � j=l+1 ˆ wljxlxj, (1)
where w0 is the global bias, wl is the weight of features
xl, and wlj models the interaction of each pair of features.
The interaction wlj can be factorized into pairs of interaction
parameters, ˆ wlj= κ � f =1 vlfvjf. (2)
The parameter κ determines the model complexity. Rather than only using single parameter for each interaction, this way allows high quality parameters estimated by higher-order interactions under sparsity. Factorization Machine provides a promising framework for recommendation problem. Unlike the generic matrix factorization model, it can be easily used to conduct feature engineering. For more details of FM, please refer to [14].
B. Grouping FM
Factorization Machines provide a good framework for modeling the interactions between features, but sometimes similar type of features may cause confusion while learning, especially with a large number of features. Hence we can utilize the bag-of-feature concept to the standard factorization machine by grouping the features with similar characteristic. Therefore it can deal with the tasks in a more flexible way with different feature partitions. After removing the non-informative weights from the FM models, the original formula can be rewritten as: ˆ y(x) = w0+ n � i wixi+ n � l∈G(l) n � j /∈G(l) xlxj κ � f =1 vl,fvj,f, (3)
where the xl belongs to the group G(l), and the mutual effect
of xl and xj is dropped out while they are in the same group.
By the grouping technique, it eliminates the unnecessary interactions such as the interaction between a user and the user’s age is non-informative. The grouping technique not only speeds up the convergence of optimization but also provides a flexible way to construct different feature combinations. Note that the modified prediction function would be the same as the original one when every feature has its own group.
LibFM provides three major optimization criteria to learn the data: stochastic gradient descent [16] (SGD), alternating least-squares [17] (ALS) and Markov Chain Monte Carlo [18] (MCMC). In our experiments, the MCMC method is chosen because it can automatically learn the data without giving
the external parameters such as learning rate 2 and the
reg-ularization term 3. For MCMC, the gradient for the grouping
2The learning rate is a common parameter for controlling the learning steps. 3The regulation term is used to prevent the model from overfitting problem.
Factorization Machine is derived as follows: hθ(x) = ∂ ˆy(x) ∂θ = 1, if θ = w0 xj, if θ = wj xj�j�∈G(j)/ vj�,fxj, if θ = vj,f (4) C. Similarity Computation
Motivated by the strength and efficiency of CF method, we seek to combines the advantages with the factorization model. Since FM has a good framework for modeling the input features, we can directly extract the similarity information from the users and items. This is similar to CF methods, and can be easily embedded into a feature vector. In general, the utilized features are divided into following three types, and each type has its own computation method.
1) ID Domain: The ID variable is used to identify a
target, and it only belongs to a specific target. For instance, User ID is in the ID domain, which means that each user has his/her own unique ID variable. Technically a similarity measurement is a function that computes the degree of similarity between a pair of targets, e.g. the similarity of listening histories of two users. Given two vectors of attributes, A and B, the similarity score is computed by the extended version of cosine similarity:
similarity = A∩ B
|A|1−α|B|α, (5)
where α ∈ [0, 1] is a tuning parameter.
2) Categorical Domain: The categorical variable
represents the extracted features from the user and item attributes such as the User Age and Music Genre. The similarity computation is also based on Equation 5.
3) Real Value Domain: If the attribute is already a
number ∈ R, such as Audio Information. The sim-ilarity score is calculated by the Euclidean distance. In general, for an n-dimensional space, the distance between feature vector q and feature vector p is:
d(p, q) = � � � � n � i=1 (pi− qi)2. (6)
For the ID domain, the function O represents the referred objects from target i and target j. For example, given the listening histories of two users the α determines whether the similarity score considers the amount of referred objects from another target or not. Take the following three users with the listening records as an example:
O(U seri) = [1, 2, 3],
O(U serj) = [1, 2, 3],
O(U serk) = [1, 2, 3, 4].
Then Userj is more similar to Useri than Userk based on
the listening history while the α = 1; on the other hand, they will get a same score while the α = 0.
For the categorical indicators, because this kind of feature usually occurs in different objects, the function O will be the collection of referred objects for a target. Take the User Age as an example, if we want to know the similarity of listening history between 15-year-old users and 30-year-old users, the function O will collect all the songs of the users whose age is between 15 and 30.
For the real-value indicators, the feature vector is
nor-malized by the standard score: x−µ
σ , where µ is the mean
of the population and σ is the standard deviation of the population. The score indicates how many standard deviations an observation is above or below the mean.
Finally suppose we have a set of similarity scores for a specific target and seek to embed them into a feature vector, a simple way is to directly index them with corresponding scores. However, the popular object generally contains more similar objects than the others. It may leads to an unbalance problem that unpopular objects are hard to get the similarity score. In order to take the balance issues into account, we only keep the top-k similar objects as the new score basis, and normalize the new vector of k values to 1:
¯ sij = sij �n j�=1|sij�| . (7)
The purpose of this step is to avoid the unbalance of
simi-larity information. For example, s(Useri) = (0, 0.8, 0.6)and
s(U serj) = (0.1, 0, 0.2), Useriwill have more probability of
getting high scores because of the high values of the similarity vector.
IV. EXPERIMENTALSETUP
This section describes the experimental setup we employed to study the influence of different factors on the performance of music recommendation.
A. Evaluation Metric
We employed two metrics to evaluate the recommenda-tion performance: the truncated mean average precision at k (MAP@k) and recall. For each user, let P (k) denotes the precision at cut-off k:
AP (u, o) =
�k
p=1P (k)× ruo(p)
I(u) , (8)
where o(p) = i describes the item i is ranked at position p in
the order list o, and rui means whether the user u has listened
to song i or not(1 = yes, 0 = no). MAP@k is the mean of the average precision scores for the top-k results:
M AP @k =
�U
u=1AP (u, o)
U , (9)
where U is the total number of target users. Higher MAP@k indicates better recommendation accuracy.
Recall measures how many songs the user really likes are recommended by the automatic system. It is computed by:
Recall = |{Correct Songs}| ∩ |{Returned T op k Songs}|
|{Correct Songs}| .
(10) High recall means that most of songs the user actually likes or listens to are recommended.
TABLE I. THE FEATURE SETS CONSIDERED IN THIS WORK.
abbr. Feature Unique Index Type U User ID 19,596 -S Song ID 30,260 -H Listening History 30,260 -BY Birth Year (of users) 100 Cb LR Live Region (of users) 208 Cb M Mood Tags (of users) 132 Cx VAD VAD values (of articles) 3 Cx A Artists (of songs) 5,175 Cb Au Audio Information 53 Cb SR Social Relation 674,932 Cx Note: P denotes the feature of user profile, Cb denotes the content-based feature that are extracted from songs, and Cx denotes the context-based feature that are extracted from user.
Fig. 2. Livejournal sample posts. B. Dataset
Our experiments are performed on a real-world dataset collected from a well-known social blogging websites – Live-Journal. LiveJournal is unique in that, in addition to the common feature of blogging, each post is accompanied with a “Mood” column and a “Music” column so that users can write down their moods and songs in their minds while posting, as Figure 2 exemplifies. From LiveJournal, we crawled a total number of 1,928,868 listening records covering 674,932 users and 72,913 songs as an initial set. For the purpose of retaining enough number of data in the training and test sets for this study, we only considered users who have more than 10 listening records and discarded the records of the other users. This filtering resulted in the final set of 225,652 listening records (11.7% of the initial set) among 19,596 users and 30,260 songs.
For evaluation, we split the dataset for each user according to the following 80/20 rule: keeping full listening history for the 80% and the half of listening history for the remaining 20% users as the training data, and the other half of the remaining 20% users as the testing data. For each record, we randomly add 10 songs as negative records to construct the testing pool. C. Feature
The structure of collected music dataset is depicted in Figure 3, as these factors affect how people choose the music. Personal factors indicate the characteristics that people would possess for a long period of time, such as age and gender. People with different levels of music background may appre-ciate music differently, which in turn affects music preference. Musical factors consist of the audio content, its profile, and even the artwork of the CD. People may choos a song because its melody or the singer. Situational factors include those that
!"#$%&"&'( )"#$*+,( (-.#"/01( 20/$*+#( 3"$.04*&01( 20/$*+#( 5#%+(6+*71%( 3*/"01( 8&9*+:04*&( 5#%+(;+4/1%#( !*/04*&( 5#%+(-**<#( -.#"/(6+*71%( ;.<"*( 8&9*+:04*&( -.#"/(=:*4*&( 6%+#*&01( 20/$*+#(
Fig. 3. The structure of LiveJournal dataset
TABLE II. AFFECTIVENORMS FORENGLISHWORDS
Description Valence Arousal Dominance dream 6.73 4.53 5.53 eat 7.47 5.69 5.60 favor 6.46 4.54 5.67 good 7.47 5.43 6.41 hate 2.12 6.95 5.05 Note: 5 example words of ANEW dictionary.
persist for a short period of time such as when and where you listen to music, what you are doing and what your mood is. People often express their feelings through listening to music, and the user-generated article reflects their recent mood.
Table I summarizes the features used in the experiments, which are described in detail below.
1) Content-based Features: Content-based features refer to features that describe either the user or the item. For describing users, we have Birth Year (BY), Live Region (LR) and Social Relations (SR) features. The birth years for the users in our dataset fall in a window of 100 years. Moreover, the users are from 208 regions. We consider users who were born in the same year or users who were from the same region as similar. On the other hand, from LiveJournal we can obtain friendship and construct the social network among the users. This gives rise to the social relation based similarity matrix. People who are friends to one another are likely to share similar music taste.
For describing songs, we have Artist (A) and Audio Infor-mation (Au) features. The artist feature simply indicates the artist (among the 5,175 possible artists) of the songs. If two songs are sung/performed by the same artist, they are likely to be more similar. The audio features consists of 53 perceptual dimensions of music, including danceability, loudness, mode,
and tempo. They are extracted by using the EchoNest API4, a
commonly used audio feature extraction tool developed in the field of music information retrieval [19]. We can measure the similarity between two songs in this 53-dimensional feature space.
4http://echonest.com/
TABLE III. EVALUATION RESULT OFCF-BASED ALGORITHMS
Model MAP@10 Recall Randomize 0.0578 0.1656 User-based CF 0.3668 0.4748 Item-based CF 0.3093 0.5115 SVD++ 0.3506 0.4844 FM 0.3817 0.5216
2) Context-based Features: The user-generated articles are interesting context-based features in the dataset, but it may contains too many redundant words. Motivated by the idea of emotional matching, we convert the original content of an article into a vector of emotional words by referring to the dictionary of Active Norms for English Words (ANEW) [20], which provides a set of normative emotional ratings for English words. We retain the words which can be found in the ANEW dictionary and weight them by the TF-IDF weighting. Specifically, a word is scored by tf(t, d) × idf(t, d), where
tf (t, d) = f (w, d)
max{f(w, d) : w ∈ d}, (11)
idf (t, d) = log |D|
|{d ∈ D : t ∈ d}|, (12)
and D is the set of all articles. A term with higher score indicates that the term has a higher term frequency wight and a lower document frequency of the term in the whole collection of articles. In addition, the ANEW dictionary also provides a set of normative emotional ratings for English words. The emotional words are rated by Valence (or pleasantness; pos-itive/negative active states) , Activation (or arousal; energy and stimulation level) and Dominance (or potency; a sense of control or freedom to act), the fundamental emotion dimen-sions found by psychologists [21]. Finally each word vector of articles is converted to valence, arousal, and dominance (VAD) values. For example, for the sentence ”I had a dream last night, I was eating a marshmallow,” the VAD values would be 14.2, 10.22, and 11.13, respectively, according to Table II. Moreover, we also collected the recent mood tags which are recent used by each user.
V. EXPERIMENTALRESULTS
We conducted a series of experiments with different set-tings. First of all, we attempted to demonstrate the similarity information is effective on most kinds of features under the factorization model. Second, we compared the performance of the standard Factorization Machine with that of the grouping Factorization Machine, and then examined the effects of dif-ferent feature combinations. Finally, we studied the sensitivity of the proposed method to the parameters
A. Similarity Approach
The similarity indicator can be represented as the cate-gorical set domain as used in [17]. For instance, suppose that ”Alice is similar to Charlie and Sandy”, the corresponding sim-ilarity indicator may be the vector z(Bob, Charlie, Sandy) = (0, 0.2, 0.8), where the sum of all values equals to 1 according to Equation 7.