• 沒有找到結果。

Type training error testing error

Statical 0.910 1.044

Dynamical 0.914 1.034

Table 5.2: Comparison of statical learning and dynamical learning

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

for Algorithm 1 and Algorithm 2. Sequentially updated cutpoints initialized by the rst 1,000 ratings in training dataset are considered due to the experimental results in the last section. The eectiveness of the found range of each setting is veried by training error and testing error. Moreover, we use Spearman correlation to discuss about the sensitivity of each algorithm toward the settings of priors and parameters.

First, dene a set containing some possible values for selected settings, S

S := Sµ(0)

βj

× S2

βj)(0) × S2

θi)(0) × Sξγ × SUser, ξItem), ξγ ≥ (ξUser, ξItem) where

Sµ(0)

βj

:= {1, 0.1, 0.01}

S2

βj)(0) := {0.1, 0.01, 0.001}

S2

θi)(0) := {0.1, 0.01, 0.001}

Sξγ := {0.001, 0.0001, 0.00001}

SUser, ξItem) := {0.001, 0.0001, 0.00001}

Then, pick one conguration sk ∈ Sto run both Algorithm 1 and Algorithm 2 to produce training error and testing error. In addition, the rank of the estimated central tendencies of items' latent quality, rank(˜µθ), under each algorithm is recorded, which can be used to calculate the Spearman correlation mentioned at the beginning of this chapter.

To simplify the process, we assume that

µ(0)γ = (−∞, (∗), ∞)0, (σ2γ)(0) = (0, 0, 0.01, 0.01, 0.01, 0)0 µ(0)α

j =mean(∗), (σα2j)(0)=1, µ(0)θ

i = 0, ∀ i, j

where (∗) is estimated from the rst 1,000 ratings in training dataset as usual.

The most important setting is (σ2βj, σθ2

i)(0). It is the small variance assumption to both σβ2j and σ2θi that makes it possible to get (4.3)-(4.18), the update formulas in Section 4.1 (for more details about small variance assumption which is not mentioned in section 4.1, please

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

refer to Appendix A.1. But, how small they have to be is an open question, that is why we have to start it from some small values, say 0.1, and decrease it exponentially to search the scale at which the results are more reliable.

Setting SUser, ξItem), Sξγ is to nd appropriate values of ξUser, ξItem, ξγ for dynamical learning. To lower the time complexity, we let the values of ξUser and ξItem be the same in each run.

Remarks

Table 5.3 shows that, statical learning tends to work better with µ(0)βj = 1, and larger (σβ2

j)(0) and (σθ2i)(0). In addition, Table 5.5 manifests that, No 6, 9, and 10 congurations seem to generate relatively dierent outcomes resulting in smaller Spearman correlation, indicating that, either setting µ(0)βj too small or setting (σ2βj)(0) with signicantly dierent scale compared to (σ2θi)(0) is not recommended.

Table 5.4 is consisted of many similar congurations with dierent combination of ξγ, (ξUser, ξItem)

. It is clear that, the searching set for dynamical learning is much larger than for statical learning due to these extra parameters. Excluding ξ, the patterns of the congurations for dynamical seem to consist with the ones for statical. They both tend to work better with µ(0)βj = 1, (σβ2

j)(0) = (σθ2

i)(0) ≥ 0.01. However, with some specic combi-nations of ξγ, (ξUser, ξItem)

, dynamical learning might outperform statical learning (No 1, 2,and 4). Through Table 5.6, we see that, dierent combinations of ξγ, (ξUser, ξItem)

seem to generate similar results, indicating that, even though (˜µ, ˜σ2) might alter with dierent congurations resulting in dierent training errors and testing errors, but the overall eects (the rank of latent ability or latent quality) are similar, which in turn manifests that the algorithm is not that sensitive to the given congurations. For rule of thumb, we recommend (ξUser, ξItem) ≤ ξγ = 0.001.

Table 5.3: Top 6 congurations for statical learning (sorted by training error)

No µ(0)βj β2

10 1 0.01 0.1 1e-05 1e-05 0.922 1.041

Table 5.4: Top 6 congurations for dynamical learning (sorted by training error)

No 2 3 4 5 6 7 8 9 10

1 0.994 0.990 0.958 0.957 0.919 0.936 0.932 0.911 0.887 2 0.999 0.948 0.962 0.909 0.949 0.946 0.915 0.901 3 0.941 0.960 0.902 0.95 0.948 0.912 0.901

4 0.977 0.984 0.949 0.943 0.966 0.927

5 0.956 0.991 0.988 0.983 0.972

6 0.926 0.920 0.967 0.914

7 1.000 0.973 0.987

8 0.970 0.987

9 0.976

Table 5.5: Spear correlation of the pairwise rank(˜µθ) under statical learning

No 2 3 4 5 6 7 8 9 10

1 1.000 1.000 0.999 1.000 1.000 0.994 0.994 0.994 0.994 2 1.000 0.999 1.000 1.000 0.994 0.994 0.993 0.994 3 0.999 0.999 0.999 0.994 0.994 0.994 0.994

4 1.000 1.000 0.993 0.993 0.993 0.994

5 1.000 0.994 0.994 0.993 0.994

6 0.994 0.994 0.993 0.994

7 1.000 1.000 1.000

8 1.000 1.000

9 0.999

Table 5.6: Spear correlation of the pairwise rank(˜µθ) under dynamical learning

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

6 Conclusions

Through experiments, we have manifested two things. First, rather than having cut-points xed after estimating them from a portion of data, updating them sequentially after setting up their priors seems to produce better results, for it is less sensitive to improper priors (Table 5.1). Second, although statical learning's computational time is less than dynamical learning's, under some constraints, dynamical learning can outperform statical learning (Ta-ble 5.2). The computational time with MovieLens 100k ratings dataset is: Statical Learning:

∼ 7 seconds (Through Rate: ∼ 14,285 ratings/sec.); Dynamical Learning: ∼ 11 seconds (Through Rate: ∼ 9,090 ratings/sec.) with the following hardware specication: OS: Win-dows 8.1 64bits; CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz; RAM: 12.0 GB.

Suitable congurations for setting up priors and parameters are found, and we acquired the information that two types of learning algorithms are prefer to the similar ones (Table 5.3-5.4). It is possible that, with other rating datasets from dierent sources, the top 10 congurations we've found could become improper, which is worthwhile to do further veri-cation. For now, we recommend the following conguration to initialize priors' information and parameters for both algorithms:

µγ(0) = (−∞, (∗), ∞)0, (σγ2)(0) = (0, 0, 0.01, 0.01, 0.01, 0)0 µ`(0) = mean(∗), 1, 00, (σ`2)(0) = (1, 0.1, 0.1)0

ξ = (0.001 ∼ 0.00001, 0.001 ∼ 0.00001, 0.001)0

where (∗) is estimated from a portion of data at hand.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

References

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. Statistical theories of mental test scores.

Coelho, F. C., Codeço, C. T., and Gomes, M. G. M. (2011). A bayesian framework for parameter estimation in dynamical models. PloS one, 6(5):e19616.

Graepel, T., Candela, J. Q., Borchert, T., and Herbrich, R. (2010). Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine.

Omnipress.

Harper, F. M. and Konstan, J. A. (2016). The movielens datasets: History and context.

ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4):19.

Ho, D. E. and Quinn, K. M. (2008). Improving the presentation and interpretation of online ratings data with model-based gures. The American Statistician, 62(4):279288.

McNeish, D. (2016). On using bayesian methods to address small sample problems. Structural Equation Modeling: A Multidisciplinary Journal, 23(5):750773.

Moser, J. (2010). The math behind trueskill.

Muraki, E. (1990). Fitting a polytomous item response model to likert-type data. Applied Psychological Measurement, 14(1):5971.

Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 4, pages 321333.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

Samejima, F. (1970). Estimation of latent ability using a response pattern of graded scores.

Psychometrika, 35(1):139139.

Shane Mac (2016). The pendulum. my attempt at building a diverse company from the start.

Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry and engineering.

Van De Schoot, R., Broere, J. J., Perryck, K. H., Zondervan-Zwijnenburg, M., and Van Loey, N. E. (2015). Analyzing small data sets using bayesian estimation: the case of post-traumatic stress symptoms following mechanical ventilation in burn survivors. European Journal of Psychotraumatology, 6(1):25216.

van der Linden, W. J. (2010). Item respoinse theory. In International Encyclopedia of Education, pages 8188.

Weng, R. C.-H. and Coad, D. S. (2018). Real-time bayesian parameter estimation for item response models. Bayesian Analysis, 13(1):115137.

Weng, R. C.-H. and Lin, C.-J. (2011). A bayesian approximation method for online ranking.

Journal of Machine Learning Research, 12(Jan):267300.

‧ 國

立 政 治 大 學

N a tio na

l C h engchi U ni ve rs it y

相關文件