• 沒有找到結果。

Large Scale Collaborative Filtering Algorithms

N/A
N/A
Protected

Academic year: 2022

Share "Large Scale Collaborative Filtering Algorithms"

Copied!
27
0
0

加載中.... (立即查看全文)

全文

(1)

Large Scale Collaborative Filtering Algorithms

Chih-Chao Ma

Department of Computer Science National Taiwan University

(2)

Outline

Introduction

Singular Value Decomposition Post-Processing

Experiments Conclusions

(3)

Introduction

Outline

Introduction

Singular Value Decomposition Post-Processing

Experiments Conclusions

(4)

Introduction

Introduction

Recommendation systems give people advices Web shopping

News (Yahoo) and videos (Youtube) Collaborative Filtering

Make predictions by taste information e.g. the ratings of products given by users

(5)

Introduction

Problem Definition

Preferences for m objects from n users V is the sparse matrix of the scores Some users do not score some objects V ∈ Rn×m with an indicator I ∈ {0, 1}n×m Predict the unknown scores in the matrix

Represented by another sparse matrix A ∈ Rn×m

(6)

Introduction

Evaluation Criteria

Performance are measured by RMSE (Root Mean Square Error)

P ∈ Rn×m is the prediction matrix with indicator J

RMSE(P, A) =

sPn i =1

Pm

j =1Jij(Aij − Pij)2 Pn

i =1

Pm j =1Jij

Give advices to users equally

Regardless of the numbers of scores

Uniform distribution on users for test data

(7)

Introduction

The Netflix Prize

A contest held by Netflix

17, 770 movies and 480, 189 users

100, 480, 507 training scores (values given) 2, 817, 131 test scores (values unknown) Select the 9 most recent scores for each user Divided into probe, quiz, and test sets

Probe set is used for offline validation Quiz and test sets are the test data

(8)

Introduction

Related Work

KNN (K -Nearest Neighbors)

Define the similarity between users or objects Predict unknown score by other “similar” ones SVD (Singular Value Decomposition)

Find the features of users and objects Predict the scores by a predefined function Regression

(9)

Singular Value Decomposition

Outline

Introduction

Singular Value Decomposition Post-Processing

Experiments Conclusions

(10)

Singular Value Decomposition

Formulation

User and object features

V ∈ Rn×m is the training matrix

find feature matrices U ∈ Rf ×n and M ∈ Rf ×m f is the dimension of SVD

Predict scores by a function p(Ui, Mj) Objective function

E = 1 2

n

X

i =1 m

X

j =1

Iij(Vij − p(Ui, Mj))2

+ ku 2

n

X

i =1

kUik2 + km 2

m

X

j =1

kMjk2

(11)

Singular Value Decomposition

Prediction Function

Dot product : p(Ui, Mj) = UiTMj

Matrix factorization problem UTM ≈ V The scores often have a range [a, b].

p(Ui, Mj) = a + UiTMj Linear model

−∂E

∂Ui =

m

X

j =1

Iij((Vij − p(Ui, Mj))Mj) − kuUi

− ∂E

∂Mj =

n

X

i =1

Iij((Vij − p(Ui, Mj))Ui) − kmMj

(12)

Singular Value Decomposition

Learning Types

Batch learning optimizes all variables at a time.

Look through all training scores Incomplete incremental learning

Consider one user (or one object) at a time Complete incremental learning

Consider one score at a time Score Vij → feature vectors Ui, Mj

Eij = 1

2(Vij − p(Ui, Mj))2 + ku

2 kUik2 + km

2 kMjk2

(13)

Singular Value Decomposition

Variants

Add per-user biases α and per-object biases β p(Ui, Mj, αi, βj) = a + UiTMj + αi + βj

Updated by also gradient descent Add Constraints on feature vectors

Constrainted SVD [Salakhutdinov and Mnih, 2007]

Import a constraint matrix W ∈ Rf ×m Ui = Yi +

Pm

k=1IikWk

Pm k=1Iik

Inappropriate for complete incremental learning

(14)

Singular Value Decomposition

Compound SVD

Combine both biases and constraints

Optimize three matrices Y , M, W and biases α, β Update Y , M, α, β by complete incremental learning Update W by incomplete incremental learning

−∇Yi = (Vij − p(Ui, Mj, αi, βj))Mj − kyYi

−∇Wk = Iik m

X

j =1

Iij

(Vij − p(Ui, Mj, αi, βj))Mj Pm

k=1Iik



− IikkwWk

(15)

Post-Processing

Outline

Introduction

Singular Value Decomposition Post-Processing

Experiments Conclusions

(16)

Post-Processing

Post-Processing

An (SVD) algorithm predicts scores with errors Estimate the test errors by training errors

R ∈ Rn×m is the residual matrix of training data Rij = Vij − p(Ui, Mj, αi, βj) if Vij exists Update the biases by residuals

j is the average residual of object j βj ← βj − ¯Rj

Rij ← Rij − ¯Rj if Iij = 1

(17)

Post-Processing

Regression

Build a model for each user Ridge regression

Features X ∈ Rt×f, target values Y ∈ Rt×1 Involve a kernel function K (xi, xj)

ˆ

y = K (x, X )(K (X , X ) + λIt)−1Y K (xi, xj) =

( xixTj p

if xixTj ≥ 0 0 if xixTj < 0 p = 5 − 20 works well in experiments

Only trust a neighbor with high similarity

(18)

Post-Processing

Acceleration

The computation of (K (X , X ) + λIt)−1 is expensive Kernel function and inversion on a t × t matrix Use a threshold on t

Acceleration under the polynomial kernel Replace K (X , X ) with It

(K (X , X ) + λIt)−1 = ( 1 1 + λ)It Still use the kernel function in prediction

(19)

Post-Processing

Weighted Average

The simplified algorithm is like a weighted sum ˆ

y = K (x, X )( 1

1 + λ)ItY

=

t

X

a=1

K (x, xa) 1 + λ ya Weight average is more reliable Modify the form again

ˆ y =

Pt

a=1K (x, xa)ya Pt

a=1K (x, xa) + λ

(20)

Experiments

Outline

Introduction

Singular Value Decomposition Post-Processing

Experiments Conclusions

(21)

Experiments

Data Sets

Movielens

6, 040 users and 3, 706 movies

1, 000, 209 scores (density = 4.61%) Select 3 scores of each user as test data Netflix

480, 189 users and 17, 770 movies 100, 480, 507 scores (density = 1.18%)

Use probe set (1, 408, 395 scores) as test data

(22)

Experiments

Algorithms in Experiments

Algorithms used for comparison

AVGB: a simple baseline predictor Pij = µj + bi SVDNR: SVD without regularization terms SVD: SVD with complete incremental learning SVDUSER: SVD with incomplete incremental learning in the order of users

CSVD: The compound SVD algorithm

Dataset AVGB SVDNR SVD CSVD

Movielens 0.9313 0.8796 0.8713 0.8706 Netflix 0.9879 0.9280 0.9229 0.9178

(23)

Experiments

Performance versus Time

(24)

Experiments

Post-Processing Algorithms

Start from the best algorithm CSVD

CSVD: Compound SVD without post-processing SHIFT: Update biases by training residuals KNN: K -nearest neighbor on residuals KRR: Kernel ridge regression on residuals WAVG: Weighted average on residuals

Dataset CSVD SHIFT KNN KRR WAVG

Movielens 0.8706 0.8677 0.8644 0.8635 0.8621 Netflix 0.9178 0.9175 0.9139 0.9101 0.9097

(25)

Experiments

Competition for Netflix Prize

Use compound SVD with dimension f = 256 Use the probe set for validation

Apply the weighted average algorithm

RMSE = 0.8868, 35th place when submitted 6.79% better than baseline RMSE 0.9514 (10% for the Grand Prize)

Ordinary SVD gives results in 0.91 − 0.93

(26)

Conclusions

Outline

Introduction

Singular Value Decomposition Post-Processing

Experiments Conclusions

(27)

Conclusions

Conclusions

Complete incremental learning for SVD Update after looking a single score

Compound SVD outperforms the original SVD Combination of biases and constraints

Post-processing algorithms on the residuals KNN, Regression, Weighted average

參考文獻

相關文件

&#34;Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values,&#34; Data Mining and Knowledge Discovery, Vol. “Density-Based Clustering in

The left panel shows boxplots showing the 100 posterior predictive p values (PPP-values) for each observed raw score across the 100 simulated data sets generated from

Normalizable moduli (sets of on‐shell vacua in string theory) Scale

4.1 Extreme Values of Functions on Closed Intervals 4.2 The Mean Value Theorem.. 4.3 Monotonic Functions and the First Derivative Test 4.4 Concavity and

• An algorithm is any well-defined computational procedure that takes some value, or set of values, as input and produces some value, or set of values, as output.. • An algorithm is

If we would like to use both training and validation data to predict the unknown scores, we can record the number of iterations in Algorithm 2 when using the training/validation

For HSK: If a test taker is late and the listening test has not begun, test takers can enter the test room and take the test; if a test taker is late and the listening test has

As to the effects of internet self-efficacy on information ethics, students who get high, middle, and low scores on basic computer operation also perform differently on