• 沒有找到結果。

IV. Experiments

4.2 Experimental Results

We implement the SVD algorithms in C, since they are the most time-consuming in computation. We use MATLAB for data preprocessing and postprocessing as it is convenient for matrix manipulation. For different SVD algorithms, the test RMSEs are plotted against the training time. Algorithms used for comparison are listed below:

AVGB: A simple baseline predictor that predicts the score Pij for user i and object j as µj + bi, where µj is the average score of object j in training data, and bi is the average bias of user i computed as:

bi = Pm

j=1Iij(Vij − µj) Pm

j=1Iij . (4.1)

SVDNR: The SVD algorithm without regularization terms. This algorithm works as another baseline.

SVD: Algorithm 2, which uses incremental learning with regularization terms.

SVDUSER: The SVD algorithm which has the same gradients as SVD, but uses incomplete incremental learning which updates the feature value after scanning all scores of a user instead of each single score.

CSVD: Algorithm 4, the compound SVD algorithm, which incorporates per-user and per-object biases and the constraints on feature vectors.

For each data set, the matrices M, U in SVDNR, SVD and SVDUSER as well as M, Y in CSVD are initialized by (2.22). The values of constraint matrix W and biases α, β in CSVD are all initialized as 0. About the parameters, the learning rate µv in SVDNR and SVD is 0.003 for Movielens data set and 0.001 for Netflix data

Table 4.1: Best RMSE achieved by each algorithm

Dataset AVGB SVDNR SVD CSVD

Movielens 0.9313 0.8796 0.8713 0.8706

Netflix 0.9879 0.9280 0.9229 0.9178

set, but SVDUSER uses various learning rates for comparison. The regularization coefficients ku, km are 0 in SVDNR, while SVD and SVDUSER use regularization coefficients leading to the best performances, which are both 0.05 for Movielens and 0.005 for Netflix . The algorithm CSVD has more parameters than other algorithms, but its learning rates µv and µw are set equal to µv in SVDNR and SVD, and its first two regularization coefficients ky, km are also equal to ku, km in SVD. The other regularization coefficient kw for the constraint matrix W is 0.02 for Movielens and 0.001 for Netflix . For the biases in CSVD, the learning rate and regularization coefficient (µb, kb) are (0.0005, 0.05) for Movielens and (0.0002, 0.005) for Netflix . All these SVD-like algorithms have the dimension f = 10.

Table 4.1 shows the best RMSEs achieved by algorithms AVGB, SVDNR, SVD, and CSVD. That is, the lowest test RMSEs given by those algorithms before over-fitting. We show the RMSEs versus training time in Figure 4.1, including algorithm SVDUSER with different learning rates. SVDNR suffers from overfitting quickly as it does not have a regularization term, while SVD gives a better performance with the same learning rate. CSVD is the best algorithm for both data sets, but it does not outperform SVD too much in Movielens as its assumption on user features is not effective in a denser data set. The behavior of SVDUSER shows that incomplete incremental learning is a bad choice in optimization. It can still lead to similar perfor-mances if the same gradients are used, but a smaller learning rate and more training time are needed. If the learning rate is not small enough, the decrease of RMSE is even

35

Figure 4.1: Plots of RMSE versus training time

unstable, so it is hard to decide the stopping condition of an algorithm. Batch learning is even worse than incomplete incremental learning. In our experiments, batch learning requires an extremely small learning rate to reach the same performance as complete incremental learning, and it even takes several days to complete the optimization for Netflix . In conclusion, complete incremental learning is better in the optimization of SVD algorithms for collaborative filtering, and the compound SVD algorithm (Algo-rithm 4) can further boost the performances of conventional SVD algo(Algo-rithms for a sparse data set like Netflix .

For the post-processing algorithms described in Chapter III, we apply them to the best algorithm in the above experiments. That is, the compound SVD algorithm CSVD. The algorithms compared are:

CSVD: The model (feature matrices and biases) obtained from the compound SVD

Table 4.2: RMSE for each post-processing algorithm

Dataset CSVD SHIFT KNN KRR WAVG

Movielens 0.8706 0.8677 0.8644 0.8635 0.8621

Netflix 0.9178 0.9175 0.9139 0.9101 0.9097

algorithm which gives the lowest RMSE. No post-processing method is applied.

SHIFT: We update the per-object biases and residuals by (3.3) and (3.4). This ap-proach modifies the model of CSVD and gives better performances in experiments, so all following post-processing methods use the residuals updated by this approach.

KNN: Algorithm 5, which applies K-nearest neighbors to the residuals.

KRR: Algorithm 6, which applies kernel ridge regression with the polynomial kernel (3.11) to the residuals.

WAVG: Algorithm 7, which uses the weighted average to predict the residuals.

The parameters of algorithms KNN, KRR, and WAVG are set as follows. The number of neighbors and regularization coefficient (k, kr) in KNN are (15, 10) for Movielens and (50, 10) for Netflix , respectively. Algorithms KRR and WAVG are different in formulation, but they have the same parameters: the power of polynomial kernel p and the regularization coefficient λ. We set them to (25.0, 1.0) for Movielens and (20.0, 1.0) for Netflix . The parameters are able to give the best test result for each algorithm. Another parameter of KRR and WAVG is the threshold on t, which is the number of objects considered for each user in Algorithms 6 and 7. A large value of t leads to a better performance but requires more computational time. In order to show the best performances of algorithms, we do not use a threshold to restrict the maximal number of t.

The resulting RMSEs of these post-processing algorithms for Movielens and Netflix data sets are shown in Table 4.2. We can see that SHIFT slightly decreases the

37

RMSEs by updating the biases, and all three algorithms KNN, KRR and WAVG have significant improvements in both data sets. The weighted average algorithm, WAVG, is considered as the best post-processing method on the residuals of SVD-like algorithms. It is also efficient as it takes only several minutes to process the Netflix data set. In contrast, the kernel ridge regression algorithm KRR requires hours of computational time for the same data set.

Next we describe our best submission so far to the Netflix competition. In gerenal, an SVD algorithm with larger dimension f gives a better perfomance, but the time and memory requirements are also proportional to f . Consider to the resources we have, we use Algorithm 4 (CSVD) with dimension f = 256 to train a compound SVD model. The training data are the whole 100, 480, 507 scores in Netflix data set, and the parameters as well as the number of iterations are decided by the experiments with a training/validation split. The per-object biases in the model are then updated as SHIFT, and Algorithm 7 (WAVG) is applied for post-processing. The baseline approach provided by Netflix gives an RMSE of 0.9514, and ordinary SVD algorithms usually give results in the range of 0.91 to 0.93. Our procedure results in RMSE = 0.8868, which is 6.79% better than the baseline approach and is ranked the 35th rank at the time of submission.

Conclusions

Singular Value Decomposition is an efficient algorithm to predict the preferences of users for collaborative filtering. It uses gradient descent to optimize the feature values of users and objects, and has a time complexity linear in the number of train-ing instances. We find that incremental learntrain-ing, especially completely incremental learning which updates values after looking at a single training score, performs well in both accuracy and efficiency. The performance of algorithms can be improved by using per-user/object biases and imposing additional constraints on the feature values.

As a combination, we propose a compound SVD algorithm which gives a significantly better performance with the same time complexity of traditional SVD algorithms.

In order to improve the accuracy of predictions, we apply several post-processing algorithms for estimating the residuals of scores from an SVD-like algorithm. That is, to predict the errors between the scores given by the algorithm and the ground-truth.

Some different algorithms like K-nearest neighbors and regression are used, but all of them depend on the feature values obtained by the SVD algorithms. The best post-processing algorithm predicts the residuals by a weighted average, where the weights are polynomial to the dot products of normalized feature vector. For experiments we use two data sets Movielens and Netflix , which collect lots of scores for movie ratings.

Our approaches show consistent results on these two data sets.

38

相關文件