Type training error testing error - 基於貝氏IRT模型之線上學習演算法

Statical 0.910 1.044

Dynamical 0.914 1.034

Table 5.2: Comparison of statical learning and dynamical learning

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

for Algorithm 1 and Algorithm 2. Sequentially updated cutpoints initialized by the rst 1,000 ratings in training dataset are considered due to the experimental results in the last section. The eectiveness of the found range of each setting is veried by training error and testing error. Moreover, we use Spearman correlation to discuss about the sensitivity of each algorithm toward the settings of priors and parameters.

First, dene a set containing some possible values for selected settings, S

S := S_µ⁽⁰⁾

βj

× S_(σ²

βj)⁽⁰⁾ × S_(σ²

θi)⁽⁰⁾ × S_ξ_γ × S_(ξ_User_{, ξ}_Item₎, ξ_γ ≥ (ξ_User, ξ_Item) where

S_µ⁽⁰⁾

βj

:= {1, 0.1, 0.01}

S_(σ²

βj)⁽⁰⁾ := {0.1, 0.01, 0.001}

S_(σ²

θi)⁽⁰⁾ := {0.1, 0.01, 0.001}

Sξγ := {0.001, 0.0001, 0.00001}

S_(ξ_User_{, ξ}_Item₎ := {0.001, 0.0001, 0.00001}

Then, pick one conguration sk ∈ Sto run both Algorithm 1 and Algorithm 2 to produce training error and testing error. In addition, the rank of the estimated central tendencies of items' latent quality, rank(˜µ_θ), under each algorithm is recorded, which can be used to calculate the Spearman correlation mentioned at the beginning of this chapter.

To simplify the process, we assume that

µ⁽⁰⁾_γ = (−∞, (∗), ∞)⁰, (σ²_γ)⁽⁰⁾ = (0, 0, 0.01, 0.01, 0.01, 0)⁰ µ⁽⁰⁾_α

j =mean(∗), (σα²j)⁽⁰⁾=1, µ⁽⁰⁾_θ

i = 0, ∀ i, j

where (∗) is estimated from the rst 1,000 ratings in training dataset as usual.

The most important setting is (σ²βj, σ_θ²

i)⁽⁰⁾. It is the small variance assumption to both σ_β²_j and σ²_θ_i that makes it possible to get (4.3)-(4.18), the update formulas in Section 4.1 (for more details about small variance assumption which is not mentioned in section 4.1, please

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

refer to Appendix A.1. But, how small they have to be is an open question, that is why we have to start it from some small values, say 0.1, and decrease it exponentially to search the scale at which the results are more reliable.

Setting S(ξ_User, ξ_Item), S_ξ_γ is to nd appropriate values of ξUser, ξ_Item, ξ_γ for dynamical learning. To lower the time complexity, we let the values of ξ_User and ξ_Item be the same in each run.

Remarks

Table 5.3 shows that, statical learning tends to work better with µ⁽⁰⁾_β_j = 1, and larger (σ_β²

j)⁽⁰⁾ and (σθ²i)⁽⁰⁾. In addition, Table 5.5 manifests that, No 6, 9, and 10 congurations seem to generate relatively dierent outcomes resulting in smaller Spearman correlation, indicating that, either setting µ⁽⁰⁾βj too small or setting (σ²βj)⁽⁰⁾ with signicantly dierent scale compared to (σ²_θ_i)⁽⁰⁾ is not recommended.

Table 5.4 is consisted of many similar congurations with dierent combination of ξ_γ, (ξ_User, ξ_Item)

. It is clear that, the searching set for dynamical learning is much larger than for statical learning due to these extra parameters. Excluding ξ, the patterns of the congurations for dynamical seem to consist with the ones for statical. They both tend to work better with µ⁽⁰⁾βj = 1, (σ_β²

j)⁽⁰⁾ = (σ_θ²

i)⁽⁰⁾ ≥ 0.01. However, with some specic combi-nations of ξγ, (ξ_User, ξ_Item)

, dynamical learning might outperform statical learning (No 1, 2,and 4). Through Table 5.6, we see that, dierent combinations of ξγ, (ξ_User, ξ_Item)

seem to generate similar results, indicating that, even though (˜µ, ˜σ²) might alter with dierent congurations resulting in dierent training errors and testing errors, but the overall eects (the rank of latent ability or latent quality) are similar, which in turn manifests that the algorithm is not that sensitive to the given congurations. For rule of thumb, we recommend (ξ_User, ξ_Item) ≤ ξ_γ = 0.001.

‧

Table 5.3: Top 6 congurations for statical learning (sorted by training error)

No µ⁽⁰⁾βj (σ_β²

10 1 0.01 0.1 1e-05 1e-05 0.922 1.041

Table 5.4: Top 6 congurations for dynamical learning (sorted by training error)

No 2 3 4 5 6 7 8 9 10

1 0.994 0.990 0.958 0.957 0.919 0.936 0.932 0.911 0.887 2 0.999 0.948 0.962 0.909 0.949 0.946 0.915 0.901 3 0.941 0.960 0.902 0.95 0.948 0.912 0.901

4 0.977 0.984 0.949 0.943 0.966 0.927

5 0.956 0.991 0.988 0.983 0.972

6 0.926 0.920 0.967 0.914

7 1.000 0.973 0.987

8 0.970 0.987

9 0.976

Table 5.5: Spear correlation of the pairwise rank(˜µ_θ) under statical learning

No 2 3 4 5 6 7 8 9 10

1 1.000 1.000 0.999 1.000 1.000 0.994 0.994 0.994 0.994 2 1.000 0.999 1.000 1.000 0.994 0.994 0.993 0.994 3 0.999 0.999 0.999 0.994 0.994 0.994 0.994

4 1.000 1.000 0.993 0.993 0.993 0.994

5 1.000 0.994 0.994 0.993 0.994

6 0.994 0.994 0.993 0.994

7 1.000 1.000 1.000

8 1.000 1.000

9 0.999

Table 5.6: Spear correlation of the pairwise rank(˜µ_θ) under dynamical learning

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

6 Conclusions

Through experiments, we have manifested two things. First, rather than having cut-points xed after estimating them from a portion of data, updating them sequentially after setting up their priors seems to produce better results, for it is less sensitive to improper priors (Table 5.1). Second, although statical learning's computational time is less than dynamical learning's, under some constraints, dynamical learning can outperform statical learning (Ta-ble 5.2). The computational time with MovieLens 100k ratings dataset is: Statical Learning:

∼ 7 seconds (Through Rate: ∼ 14,285 ratings/sec.); Dynamical Learning: ∼ 11 seconds (Through Rate: ∼ 9,090 ratings/sec.) with the following hardware specication: OS: Win-dows 8.1 64bits; CPU: Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz; RAM: 12.0 GB.

Suitable congurations for setting up priors and parameters are found, and we acquired the information that two types of learning algorithms are prefer to the similar ones (Table 5.3-5.4). It is possible that, with other rating datasets from dierent sources, the top 10 congurations we've found could become improper, which is worthwhile to do further veri-cation. For now, we recommend the following conguration to initialize priors' information and parameters for both algorithms:

µ_γ⁽⁰⁾ = (−∞, (∗), ∞)⁰, (σ_γ²)⁽⁰⁾ = (0, 0, 0.01, 0.01, 0.01, 0)⁰ µ_`⁽⁰⁾ = mean(∗), 1, 0⁰, (σ_`²)⁽⁰⁾ = (1, 0.1, 0.1)⁰

ξ = (0.001 ∼ 0.00001, 0.001 ∼ 0.00001, 0.001)⁰

where (∗) is estimated from a portion of data at hand.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

References

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. Statistical theories of mental test scores.

Coelho, F. C., Codeço, C. T., and Gomes, M. G. M. (2011). A bayesian framework for parameter estimation in dynamical models. PloS one, 6(5):e19616.

Graepel, T., Candela, J. Q., Borchert, T., and Herbrich, R. (2010). Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine.

Omnipress.

Harper, F. M. and Konstan, J. A. (2016). The movielens datasets: History and context.

ACM Transactions on Interactive Intelligent Systems (TiiS), 5(4):19.

Ho, D. E. and Quinn, K. M. (2008). Improving the presentation and interpretation of online ratings data with model-based gures. The American Statistician, 62(4):279288.

McNeish, D. (2016). On using bayesian methods to address small sample problems. Structural Equation Modeling: A Multidisciplinary Journal, 23(5):750773.

Moser, J. (2010). The math behind trueskill.

Muraki, E. (1990). Fitting a polytomous item response model to likert-type data. Applied Psychological Measurement, 14(1):5971.

Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, volume 4, pages 321333.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Samejima, F. (1970). Estimation of latent ability using a response pattern of graded scores.

Psychometrika, 35(1):139139.

Shane Mac (2016). The pendulum. my attempt at building a diverse company from the start.

Strogatz, S. H. (1994). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry and engineering.

Van De Schoot, R., Broere, J. J., Perryck, K. H., Zondervan-Zwijnenburg, M., and Van Loey, N. E. (2015). Analyzing small data sets using bayesian estimation: the case of post-traumatic stress symptoms following mechanical ventilation in burn survivors. European Journal of Psychotraumatology, 6(1):25216.

van der Linden, W. J. (2010). Item respoinse theory. In International Encyclopedia of Education, pages 8188.

Weng, R. C.-H. and Coad, D. S. (2018). Real-time bayesian parameter estimation for item response models. Bayesian Analysis, 13(1):115137.

Weng, R. C.-H. and Lin, C.-J. (2011). A bayesian approximation method for online ranking.

Journal of Machine Learning Research, 12(Jan):267300.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

在文檔中基於貝氏IRT模型之線上學習演算法 - 政大學術集成 (頁 39-46)

Type training error testing error

Statical 0.910 1.044

Dynamical 0.914 1.034

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Remarks

‧

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

6 Conclusions

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

References

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學