促進大型開放式線上課程中的同儕互評

(1)

國⽴臺灣⼤學電機資訊學院電機⼯程學系碩⼠論⽂

Department of Electrical Engineering

College of Electrical Engineering and Computer Science

National Taiwan University Master Thesis

促進⼤型開放式線上課程中的同儕互評 Encouraging Peer Grading in MOOCs

柯劭珩 Shao-Heng Ko

指導教授：陳和麟博⼠

Advisor: Ho-Lin Chen, Ph.D.

中華⺠國 106 年 5 ⽉

May, 2017

(2)

(3)

(4)

(5)

誌謝

在論⽂完成的此刻，⾸先要感謝指導教授陳和麟⽼師。有⼈說，普通的

⽼師是拉著學⽣往前⾛；⽽好的⽼師則是在⼀條做學問的路上先點起燈，讓學⽣⾃⼰⾛，在摸索的過程中⾃⼰成⻑茁壯。然⽽，我的指導教授不屬於以上任何⼀種。他會在任何時刻，為學⽣指引出前⽅每⼀條不同的路，其中許多都有他探訪過留下的⾜跡。⽽當你居然選擇了⼀條他未曾去過的路，他也會陪著你⾛⼀遭。在碩⼠班的兩年中，⽼師給了我⾮常⼤的彈性和信任，讓我能將興趣和注意⼒集中在許多不同的事物上，最後能從⽣活中找到⼀個新的研究題材；⽽在論⽂的發想、撰寫和修改時，總是精準的找到我的盲點。

接下來要感謝⼝試委員們：葉丙成⽼師、于天⽴⽼師、孔令傑⽼師。三位⽼

師在百忙中抽空審閱本篇論⽂、擔任⼝試委員，也都對本篇論⽂提出了⾮常實⽤的建議。然⽽⽼師們所不知道的是，，在接受邀請之前，其實他們早已對我的研究做出了深遠的影響。在⼤學⼆年級時，有緣成為葉⽼師前幾批翻轉教學的對象，讓我體驗到了不同的學習樂趣和成就感，進⽽啟發我踏⼊

MOOCs 的世界；⽽于⽼師的⼈⼯智慧，則是我⼈⽣中第⼀堂透過 MOOCs 平台修習的課程。在孔⽼師的資訊經濟課堂中，我學習了許多建模和分析的技巧，開始思考直接從⽣活中尋找新問題的可能性。本篇論⽂的研究主題，是我在台⼤ MOOCs 團隊實習過程中實際遭遇的問題，⽽⽼師們除了都和團隊有密切的關係，各⾃開授的實體課程，也都是我⼤學和碩⼠班⽣涯中，最有收穫的幾堂課。能夠邀請到⽼師們擔任我的⼝試委員，是⾮常圓滿的緣分。

感謝台⼤ MOOCs 團隊的教學設計師、實習⽣同仁們，兩年來無論在實務或理論上的切磋討論、思辯交流；感謝我的⼥朋友曉亭，在這段期間不間斷的陪伴，在咖啡館中⾒證主要成果的誕⽣；感謝我的⽗⺟、家⼈、朋友們的⽀

持與⿎勵，讓我能專注在學業之上。作為碩⼠論⽂，我相信這只是個開始，

我盼望並且期待，能將此⽣虛擲在思考有趣的事物之上。

(6)

(7)

摘要

在⼤型開放式線上課程 (MOOCs) 當中，由於學習者數量極為龐⼤，

⾼階學習表現通常只能透過同儕互評 (Peer Grading) 的⽅式來評量。在 MOOCs 中實施同儕互評時，學習者通常缺乏為其他⼈評分的動機，

因⽽沒有付出⾜夠的⼼⼒。為改善此現象，我們考慮讓學習者的成績與其評量他⼈的準確度相關的機制，並建⽴相關的賽局理論模型，以分析學習者在此機制下的理性⾏為。我們發現⼀組能保證純粹均衡存在的條件，在此條件下，課程設計者將可透過調整機制參數的⽅式，

促進學習者投資更多的⼼⼒在評分之上。更進⼀步，若學習者之間具有同質性時，我們證明在所有純粹均衡當中，所有為同⼀份作業評分的學習者都會付出恰相等的時間。藉由這個性質，我們能夠計算所有可能的純粹均衡點。我們將上述結果推廣到某些學習者並⾮採取理性策略的狀況，並討論如何在實際情況中應⽤本研究的結果。

關鍵字：賽局理論，⼤型開放式線上課程，同儕互評，納許均衡，機 制設計

(8)

(9)

Abstract

Due to huge participant sizes in Massive Open Online Courses (MOOCs), peer grading is practically the only existing solution to grading high-level as- signments. One of the main issues of utilizing peer grading in MOOCs is that learners are not motivated and do not spend enough effort in grading. To modify current peer grading mechanism to induce better grading, we focus on the idea of making the learners’ grade depend on the accuracy of their grading of others’ work. We built a game theoretical model to characterize the rational behavior of learners in such a mechanism. We found a set of conditions which guarantees existence of pure-strategy equilibria. When the conditions are satisfied, the course designer can encourage the learners to spend more time on grading through tuning the mechanism parameters. Furthermore, when the learners are assumed to be homogeneous, we proved that in any pure equilibrium, any submitted work will be graded with identical effort by the relevant graders. With this property all the possible pure equilibria are theoretically calculable. We also extended our result to the case where some of the learners are not strategic or rational. We discussed applications of our results in practical situations.

Keywords. Game Theory, Massive Open Online Courses, Peer Grading, Nash Equilibrium, Mechanism Design

(10)

(11)

Chapter 1 Introduction

Online learning is becoming a flourishing industry. In addition to the long-standing Open- CourseWares, learners today have MOOCs, or Massive Open Online Courses, to gain knowledge from. Unlike OpenCourseWares, which simply releases the course material, MOOCs are “online classes that anyone, anywhere can participate in” [13]. Furthermore, current MOOCs are trending toward formality; they are charging certification fees, offer- ing institutional credits and even online programs. Consequently, the task of performance assessment is becoming more and more important.

It is common for a MOOC to have thousands of participants, in which the load of evaluating the grades is well beyond any course staff can afford. While automated grading tech- niques can easily handle multiple choices and programming tasks, they are nearly useless when it comes to grading more sophisticated assignments, like mathematical proofs, art work, written pieces and speeches. Peer grading is practically the only existing solution to grading high-level assignments.

The concept of peer grading comes from traditional pedagogy, where its effects were well-studied [1, 8, 9, 20, 21]. In the context of MOOCs, after any learner submits an assignment, it will be graded by several peers, and the learner himself is required to grade the others’ assignments as well. The final score of an assignment is determined by some aggregation of all the scores given to it. Past work [4, 9, 15] has shown that such aggregation is decently close to the evaluation from the instructor even with a small number of graders grading each assignment. Peer grading also has the effect of deepening learner’s

(14)

comprehension [19], building positive learning environment [21], and even metacognitive benefits [8].

However, there are problems to be addressed when peer grading is applied in MOOCs.

Learners in MOOCs, who come from all around the world and all backgrounds, feature great diversity. Empirical study shows that peer grading has weaker reliability in MOOCs than in real classes [17], and not all learners are satisfied with the mechanism [4,17]. This leads to a main question:

How can we modify the peer grading mechanism to induce better grading precision?

The work of Piech et al. [18], with an eye on tuning the mechanism used by Coursera, investigated various probabilistic models of peer grading. A part of their work assigns weights to graders by past performance, exploiting the assumption that more adequate grading is correlated with higher grades. While the assumption is still up for debate [7], Piech et al. claim that peer grading accuracy can be improved simply by measuring the bias and reliability of graders. However, there remains an unaddressed issue: graders in practice are not spending enough time on peer grading, possibly due to lacking motiva- tion. Thus, Piech et al. called for game theoretical research on mechanisms to incentivize learners to put more effort into grading.

Following the above work, there have been multiple work on incentivizing the learners, by both rewards and punishments. On the rewards side, de Alfaro and Shavlovsky [6]

implemented a system named Crowdgrader that lets learners collaboratively review and grade assignments. In this system, the overall grade of a learner depends both on the aggregate grade received, and their “precision” in reviewing their peer’s work, which is determined by the average error with respect to the consensus grade. While this ever- evolving platform is primarily used in college classes, the designers reported that “the number of students who complained about mis-gradings was about the same as the one we typically experience using TAs” [6].

On the punishments side, Carbonara et al. [3] tackled the problem using an audit game

(15)

approach, in which learners are penalized if they are caught misgrading the others’ work.

They studied the problem of allocating limited auditing resources, like TA hours, to heterogeneous learners. Under the assumption that all learners spend a fixed amount of total time in doing the assignment and peer grading, they showed an algorithm to obtain an approximately optimal allocation. However, this assumption is also the biggest caveat of their model, as it is better to motivate learners to spend more time on learning.

Finally, Lu et al. [16] designed a large experiment to motivate peer graders by letting their grading performance to be examined as well, but without any rewards or punishments as consequence. While this alone seemed not to do the trick, they found that learners improved their grading accuracy by only evaluating the others’ grading performances.

They proposed the possibility of motivating the graders by other incentives instead of grades.

Motivated by all the above work, and the fact that most of the work about peer grading are empirical studies, we aim to investigate the behavior of peer graders under the rewards approach from a game theoretical perspective. By building a model of the peer grading mechanism, we ask the following main questions:

• What will be the rational behaviors of graders?

• Under what conditions will such mechanisms work in line with our expectations?

• What can the course provider do to affect the graders’ behaviors?

The overall structure of our model is similar to the design in [6], in which a consensus grade of an assignment directly comes from aggregation of the graders’ opinions. An evaluation of one grader’s accuracy is then measured by how far away his opinions are from the related consensus grades. The final utility of a learner is realized as a linear combination of his assignment score and his grading accuracy. Instead of strategically deciding how much points they give to a particular assignment, we assume that each grader only decides the amount of time he puts into every grading task, which will not be observed

(16)

by the course provider. Unlike that in [3], we do not assume a tradeoff between time put in doing assignment and in peer grading, as these two phases often have different time periods in common MOOCs settings.

Though we mainly think of the utility from precise grading as given grades, our model can capture other types of rewards like fame, self-fulfillment, or even monetary rewards and career opportunities. While [6] measures the grader’s overall accuracy, we take into account every grading attempt rather separately, broadening the possible strategies for each grader. Generally, we assume the peer graders are heterogeneous, both in their grading ability and their evaluation of time, which captures the feature of a MOOCs environment.

Similar to that in [18], we assume that the grade given by a peer grader to each assignment is distributed around the true value of the assignment. Naturally, the degree of dispersion depends on the effort in grading. We do not assume a peer grader is capable of doing precise grading even with unlimited effort. We further assume that the graders do unbiased grading, which means they overgrade and undergrade equally often. While peer graders tend to bias toward high grades in reality [11, 15], this effect can be neutralized by pretreatments after measuring the bias.

The main result of this paper is a set of sufficient conditions, called the encouraging conditions, which guarantees the existence of pure Nash equilibria. Once the condition is met, the course designer can encourage or discourage the graders to spend more time on grading by tuning the mechanism parameters. We also proved that the encouraging conditions are satisfied in a series of exact situations, where the grade given by a peer grader to an assignment follows a normal distribution with the intrinsic value of the assignment being the mean and the variance being a non-increasing and convex function of the time spent.

Later, we focus on a special environment where all learners are homogeneous. This setting can be related to SPOCs, or Small Private Online Courses, where “MOOCs are used as a supplement to classroom teaching” [10]. We found that under this assumption, every pure Nash equilibrium has the property that each assignment is graded by the same

(17)

level of effort by all the relevant graders. On the other hand, we pointed out that the pattern of equilibrium behaviors is still valid in the situation when some of the graders are assumed to be irrational players with perdetermined strategies.

The general model we use is introduced in Section 2, where we promptly simplify it into more compact models. Our main result lies in Section 3, including the encouraging conditions, the consequent existence of pure equilibria, and how the mechanism designer encourages peer grading by tuning the parameters. We also propose a series of practical settings that satisfy the encouraging conditions. In Section 4, we present the further limited homogeneous model and obtain stronger results. In Section 5 we extend our analysis to include irrational graders. Finally, discussions toward biased grading and practical implications are in Section 6.

(18)

(19)

Chapter 2 Model

In this chapter, we propose a model which will be used throughout the paper. We characterize those assignments that can be graded without subjectivity by assuming an intrinsic value for each submitted work. Intuitively, the more time a learner spent on grading a submission, the closer his grading will be to its intrinsic value. Our model is mainly built on such assumption, while the overall structure is similar to the common mechanism used in MOOC platforms.

2.1 Players and Actions

We assume N learners, {a1, a₂, ...a_N}, working on some assignment in a MOOC, have already finished their submissions and are entering the peer-grading phase. The submitted work of a_i has intrinsic value v_i ∈ [0, M], where M is the maximum score of this assign- ment. The grading task is fully described by G, an N by N matrix with boolean elements.

G(i,j) equals to 1 if aj is asked to peer grade the submitted work of ai, and 0 otherwise.

Each learner is asked to grade exactly k submissions, and each submission is graded by exactly k learners; hence,∑

iG_(i,j)=∑

iG_(j,i) = k. Furthermore, no learner self-grades his submitted work; hence G_(i,i) = 0. We assume that all possible grading relations are chosen equally likely beyond the learners’ knowledge. Thus the learners cannot inference any information about who is grading their submissions or vice versa.

(20)

The vector Tj = [t¹_j, t²_j, ...t^N_j ] is the strategy of learner aj. tⁱ_j ∈ [0, U] is the amount of time he puts on grading the submission of a_i, where U is assumed to be the upper bound limit. This upper bound is natural most of the time; no player will spend longer time if the corresponding cost is larger than the maximum reward possible. While such limit does not exist in practice, peer grading cannot take forever long, and there will be a stop-loss point such that keep increasing the time will not do any good. In reality, a grader only determines how much of time he puts on each submission he receives. Therefore, tⁱ_j = 0 if G(i,j)= 0. The course provider cannot observe any learner’s strategies.

2.2 Grading Mechanism

In this model, we assume that a peer grader does not strategically give grades. Instead, the grades he gives are random variables, following distributions decided by the amount of time (effort) he puts on each submitted work, and its intrinsic value. To characterize this, we denote f_j(·) to be the grading possibility function of grader j, where fj(x, v, t) refers to the probability that grader j gives x points to an assignment with intrinsic value v, after spending t units of time on grading it. Thus, given fixed v, t, fj(x, v, t) is a probability density function, and corresponds to a cumulative density function F_j(x, v, t). Clearly, f_j(x, v, t) = 0∀j, ∀ x /∈ [0, M]. Let Sjⁱ be the score a_j gives to the submitted work of a_i. If G_(i,j) = 1, then S_jⁱ ∼ Fj(·, vi, t_j). Furthermore, we assume unbiased grading: given v, t, f_j(v−x, v, t) = fj(v + x, v, t), ∀x ∈ R. To simplify notations, we use fgto represent the functions f₁(·, ·, ·), f2(·, ·, ·), ...fN(·, ·, ·).

Denote Si = [S_jⁱ|G(i,j) = 1] to be the vector of grades given to the submission of ai. The aggregate score, or consensus grade, of ai’s submission, is then determined by Sˆ_i = f_agg(S_i). Naturally, f_agg(S_i) = f_agg(S^′_i) if S^′_i is a permutation of S_i; that is, the consensus grade should not depend on the order of the peer grades given. Furthermore, f_agg(·) should be non-decreasing in every element in Si. There exist various methods in aggregating peer grades; for example, the Crowdgrader platform in [6] uses an Olympian average function, where the highest and the lowest grades are dropped before taking

(21)

average. Coursera chooses a median function instead [5]. Thus we do not specify an exact method here. Since we require f_j(·, vi, t) to be symmetric with respect to v_i for any t, given any strategy profile, the derived probability distribution of ˆS_i will also be symmetric with respect to v_i.

Next, the accuracy level of a_j grading the submission of a_i is determined by αⁱ_j = f_accu(|S_jⁱ − ˆS_i|) ∈ [0, 1], while faccu(·) must be non-increasing. Similarly to above, we only require that the function solely depends on the difference between the single grade and the aggregated grade. Finally, the average peer grading accuracy of a_j is determined by ˆα_j = ¹_k∑

i(αⁱ_j). Clearly, since αⁱ_j ∈ [0, 1], ˆα_j ∈ [0, 1].

2.3 Utility

We assume that grader a_ihas time-to-grade ratio r_i ≥ 0, which means he is willing to give up r_i units of time to earn one point in his grade. All values of r_i are public information.

We then define the time-to-grade ratio vector r = [r_i]. Also, we define λ to be the portion grading performance accounts for in one’s final utility. This means a learner can earn up to λM points by peer grading, if we take into account only rewards in grades. The final utility of learner a_i is defined to be π_i = λM ˆα_i− ri

∑

jt^j_i. All learners are risk-neutral.

Note that ai should get (1− λ) ˆSi points from his own work as well. However, this part is independent to any of his strategic decision. Hence we can remove this term from the utility in our model. We can define one specific game model by defining all the above parameters, functions and probablistic distributions.

Definition 2.1. An unbiased peer grading game, or UPG game, is a tuple G =(

N, k, M, r, U, λ, f_g, f_agg(·), faccu(·)) .

Note that all f_j’s need not to be the same. Hence, the graders’ actions, and consequently the perceived score distributions are generally heterogeneous. Nowadays, it is common for MOOCs to utilize analytic rubrics in peer grading assignments, leading to more objective and systematic grading results; however, complete objectivity is impossi-

(22)

ble to be achieved. Also, how learners value their time is generally out of the instructor’s control.

Parameters and functions M, k, λ, f_agg(·), faccu(·) are chosen by the mechanism de- signer. These information, N, r and f_g and the whole structure of the mechanism are public information to all learners, while the intrinsic value vector V = {vi} and the re- lation matrix G are unobserved. Since the learners are risk-neutral, learner a_i rationally determines T_i in order to maximize E[π_i]. Note that T_i will not be observed by the designer and will not be directed used to determine the reward; the designer can only decide the parameters and functions above, seeking to induce better overall effort and/or grading accuracy.

2.4 Decomposition of Model

First and foremost, a simplification can be directly made from our model. We can observe that an UPG game is indeed a series of independent smaller subgames, each containing only one submitted work of assignment. This is described in the theorem below.

Definition 2.2. Given the other players’ strategy profile, T_−j ={T1, ...T_j₋₁, T_j+1, ..., T_N}, we define learner a_j’s best response T_j^∗to be the optimal choice of T_jthat maximizes E[π_j], conditioned on T_−j.

Theorem 2.1. Given the other players’ two strategy profiles, T_−j, T_−j^′ , the corresponding best responses t^∗_j and t^′∗_j satisfies the following: t^∗i_j = t^′∗i_j, if tⁱ_j′ = t^′i_j′,∀j^′ ̸= j.

Proof. Consider player j. We first observe that his total expected utility can be decom- posed into

E[πj] =∑

i

(λM

k E[αⁱ_j]− rjtⁱ_j).

Therefore, maximizing the expected utility is equivalent to maximizing the total of the k nonzero terms, each corresponding to the expected utility from grading one specific submitted work. Given the other players’ strategies, fixing i, the term ^λM_k E[αⁱ_j]− rjtⁱ_j

(23)

only depends on Tⁱ = [tⁱ_j|G(i, j) = 1]. Thus, maximizing the total of all k terms is equivalent to simultaneously maximizing k terms independently.

If tⁱ_j_′ = t^′i_j_′,∀j^′ ̸= j holds, then the optimal choice of tⁱj, which maximizes ^λM_k E[αⁱ_j]− r_jtⁱ_j, should also maximize ^λM_k E[α^′i_j]− rjt^′i_j.

Equivalently, the general UPG game can be decomposed into N subgames, each con- taining one submitted work of assignment and k relevant peer graders. The equilibrium of a general UPG game, described by all learner’s effort on all submitted works they grade, is then composed of their respective effort in all subgames. Here we emphasize that there exists no total time limit for a learner, or equivalently, the maximum time possible to put in, which corresponds to the length of the peer grading phase, is much longer than kU , which is the maximum total time spent on grading. Thus, putting in more time grading one submission does not affect the grading of other submissions. With the help of this property, we can separate them and analyze one subgame at a time. For the rest of the paper, we define such subgame as follows to simplify the notations.

Definition 2.3. In a UPG subgame, only k learners are considered; each of them grades the same submission with value v. t_j represents the amount of effort learner j puts in grading the submitted work, dropping the superscript from tⁱ_j in the general game. S_j, ˆS and α_j are defined analogously, inheriting the meaning of the counterparts with superscript. πj = E[^λM_k α_j − rjt_j] is now the utility learner a_j gets directly from his peer grading effort in this subgame.

Note that N becomes irrelevant once we separate a UPG game into subgames, and we can fully describe such a subgame by specifying (M, k, r, U, λ, fg, fagg(·), faccu(·)). We call this tuple a setting of a UPG subgame. Note that here fgrepresents only k functions.

(24)

(25)

Chapter 3 Analysis

3.1 Encouraging Conditions and Existence of Equilibria

In this section, we define the following encouraging conditions and prove that these conditions lead to the existence of pure Nash equilibria in a UPG subgame.

Definition 3.1. A setting of a UPG subgame (M, k, r, U, λ, fg, fagg(·), faccu(·)) satisfies the Encouraging Conditions if:

• (EC-1)

Given the other players’ strategies T_−j = [t₁, t₂, ..., t_j₋₁, t_j+1, ..., t_k], E[α_j](T_−j, t_j) is non-decreasing and strictly concave on t_j ∈ [0, U].

• (EC-2)

Let T_−j and T^′_−j be two strategy profiles satisfying the following properties:

– tp < t^′_p for some p, – tq = t^′_q∀ q ̸= p,

Then _∂t^∂

jE[α_j](T_−j, t_j) < _∂t^∂

jE[α_j](T_−j^′ , t_j),∀tj.

Intuitively, EC-1 says that if a grader puts more effort in grading, his accuracy im- proves, with a diminishing marginal effect. The following lemma says that, combining with a linear cost on time, EC-1 makes the decision problem for the grader straightforward.

(26)

Lemma 3.1. If a setting of a UPG subgame satisfies EC-1, then given any T_−j, πj(T_−j, tj) is strictly concave on t_j ∈ [0, U], and the best response t^∗j is unique.

Also, either _∂t^∂

jE[αj](T_−j, t_j)

t^∗_j

= _λM^kr^j, or t^∗_j ∈ {0, U}.

Proof.

∂²(π_j)

∂(t_j)² = ∂²

∂(t_j)² λM

k E[α_j](T_−j, t_j)− ∂²(rt_j)

∂(t_j)²

= λM k

∂²

∂(t_j)²E[αj](T_−j, tj)

< 0,

which gives the first statement.

The last inequality is from the concavity of E[α_j], required in EC-1. Since the partial utility function is concave, its global maximum either lies on the boundary or satisfies the first-order condition

∂(πj)

∂(t_j)

t^∗j

= ∂

∂(t_j) λM

k E[α_j](T_−j, t_j)− rjt_j t^∗j

= λM k

∂

∂(tj)E[αj](T_−j, t_j) t^∗j

− rj

= 0,

which gives the second statement.

Note that in the boundary cases,

• _∂t^∂

jE[αj](T_−j, t_j) > _λM^kr^j,∀tj ∈ [0, U], if t^∗j = U .

• _∂t^∂

jE[α_j](T_−j, t_j) < _λM^kr^j,∀tj ∈ [0, U], if t^∗j = 0.

On the other hand, EC-2 states that, an arbitrary grader unilaterally increasing his own grading effort will increase the marginal utilities for all other graders. Intuitively, although without specifications, we generally think of the grading possibility functions to be distributed closer to the intrinsic value if the effort is increased. Therefore, unilaterally increasing effort will make the concensus grade also distributed closer to the intrinsic

(27)

value. This effect is generally good for graders since they can only invest effort to stand closer to the intrinsic value, instead of the concensus. If this effect can encourage graders to invest more effort, then we can expect a positive reinforcement process. The following theorem shows that pure Nash equilibria always exist if both conditions are satisfied. The positive reinforcement process can be seen in the proof.

Theorem 3.2. In any UPG subgame that satisfies both encouraging conditions, there exists at least one pure Nash equilibrium.

Proof. We prove the existence of pure Nash equilibria by describing a virtual algorithm that “calculates” one.

Given the setting of the UPG subgame, initially we let T_i = 0∀i, or equivalently no grader puts in any kind of effort. We then modify the effort levels one by one in the order of i if an equilibrium is not reached yet. When modifying T_i, we fix all other effort levels T_−i, and move T_ito the best response of grader i, with respect to T_−i.

Trivially, since all effort levels are initialized to be zero, T_ieither stays at zero or is raised upwards when it is modified for the first time. Also by EC-2, whenever one of the effort levels is raised upwards, all marginal utilities for all other graders will be increased.

Suppose the effort level of grader j, now temporarily set to Tj, is being modified for the second or more time, and no effort levels were decreased in the previous k− 1 modifica- tions.

By the previous lemma, if the marginal utility, _∂(t^∂

j)E[α_j](T_−j, t_j)− _λM^kr^j, is negative in the whole interval [0, U ], then the best response for grader j will remain at zero regard- less of all others’ strategies. Suppose not, then in the previous round, T_j is set to make the marginal utility at zero. By the effects of EC-2, the marginal utility is now non- negative after a whole round of modifications. Suppose in the current round the other graders’ strategies are T_−j, and T_j is about to be modified to T_j^′. From EC-1, since

∂

∂(tj)E[αj](T_−j^′ , t_j) = _λM^kr^j ≤ _∂(t^∂_j₎E[αj](T_−j, t_j), we have T_j^′ ≥ Tj. Combined with the fact that all effort levels can only increase in the first round, by induction we can prove that all effort levels are never decreased.

Since the effort levels cannot exceed U , our algorithm will eventually converge to a stable

(28)

state when k consecutive effort levels are not modified in their turns, which gives a pure Nash equilibria. However, this algorithm does not converge in any guaranteed time limit, so it has little use of calculating the equilibria in practice.

3.2 Encouraging Peer Grading

In the previous section we have already seen the ratio _λM^krⁱ, a grader’s marginal accuracy reward in equilibrium. Any marginal accuracy reward at least this large will justify putting in more time, hence it is the effective time-to-grade ratio. While riis an exogenous param- eter, k, M and λ is specified by the mechanism designer. Thus, the designer can increase or decrease this ratio to any arbitrary extent. Below we describe its effect on the possible equilibria.

Definition 3.2. The effective time-to-grade ratio for grader i in a UPG subgame is

¯

r_i = _λM^krⁱ = ¯rr_i, where ¯r = _λM^k .

Theorem 3.3. Assume k is fixed. Suppose both EC-1 and EC-2 are satisfied in a UPG subgame G₁ with ¯r = ¯r₁, and G₂, G₃ are UPG subgames that differs with G₁ only in

¯

r2 > ¯r1 > ¯r3. If T1 = [t1i] is an equilibrium in G1, then

• There exists an equilibrium T2 in G2 where t2i ≤ t1i∀i.

If the i-th equality holds, then t_1i = 0.

• There exists an equilibrium T₃ in G₃ where t_3i ≥ t1i∀i.

If the i-th equality holds, then t_1i = U .

Proof. We first prove the first statement and assume that ti > 0∀ i.

For convenience, we denote δi(X_−i, xi) = _∂t^∂

iE[αi](T_−i, ti)

T_−i=X_−i, ti=xi

to be the slope of expected reward of player i, when player i spends x_i and the overall strategy profile of all other players is X_−i. We know that δ_i(X) is nonincreasing on x_i (which comes from EC-1) and increasing on every element in X_−i (which comes from EC-2), and that δ_i((T₁)_−i, t_1i) = ¯r₁r_i∀i.

Suppose δ_i(0, 0) ≤ ¯r₂r_i∀i, then δi(0, y) ≤ δi(0, 0) ≤ ¯r₂r_i∀i, which means T = 0 is

(29)

an equilibrium in G2. Otherwise there exists j such that δj(0, 0) > ¯r2rj, which means there exists z > 0 such that δ_j(0, z) = ¯r₂r_j. Since ¯r₂ > ¯r₁, there exists y < t_1j such that δ_j((T₁)_−j, y) = ¯r₂r_j. Consequently, δ_j(0, y) < δ_j((T₁)_−j, y) = ¯r₂r_j, so z < y. Let Br^′ be the best response function in G₂. Br^′ is continuous, and we have that 0 < z = Br^′(0) < y = Br^′((T₁)_−j) < t_1j.

Though the best response function takes k− 1 arguments, we now limit the degree of freedom to 1, by restricting the input vector to be parallel to (T₁)_−j. Let (βT₁)_−j be the strategy profile of all graders except j that satisfies (βt1)i = _U^βt1i∀i. Denote ˜Br^′(β) = Br^′((βT₁)_−j), then we have 0 < z = ˜Br^′(0) < y = ˜Br^′(U ) < t_1j ≤ U. By the fixed- point theorem there exists at least one w ∈ (z, y) s.t. ˜Br^′(w) = w. This means there exists an equilibrium in G₂where t_j = w < t_1j and t_i = _U^wt_1i < t_1i∀i ̸= j.

By the same method we can prove the equality cases. Furthermore, the second statement can be proved analogously by comparing δ_i(U, U ) to ¯r₃r_i.

Equivalently, increasing the effective time-to-grade ratio distorts the equilibrium downwards, and vice versa; extreme equilibria are preserved if the environment is made even more extreme. Suppose all functions are smooth, the equilibria will move continuously with the fluctuation of ¯r. In fact, ¯r is what the mechanism designer can really control. De- creasing λ or M all leads to an increase of ¯r, which “discourages” grading behavior and moves all the equilibria downwards; while increasing either λ or M “encourages” grading behavior, and moves all the equilibria upwards. Certainly, in practice λ and M cannot be raised without limitation, which will be discussed in Section 6.

The above implications may seem trivial at a first glance; learners invest effort and should get reward from that. However, we should point out again that our mechanism works under a subtle limitation that instructors cannot perceive and evaluate the learners’

effort, but only the grading outcomes. Therefore whole mechanism employs a “reward comes from outcomes” method. We have shown that, this method is as effective as an ideal “reward comes from effort” scenario, and the idea of encouraging learners only by rewarding their closeness to concensus is, game-theoretically, indeed effective.

Last, we can argue that all the above results in Section 3.1 and 3.2 will still hold even if

(30)

the expected accuracy level mentioned in EC-1 is only weakly concave. While this means that graders generally have multiple best responses to choose from, choosing the highest effort level among those best responses does not violate rationality. Therefore the positive reinforcement still exists, and all the other results follow. The only difference is that there will be far more possible equilibria.

3.3 Settings That Meet the Encouraging Conditions

So far we still do not know under what settings of a UPG subgame that the encouraging conditions will hold. In this section, we describe a series of practical settings of a UPG subgame that satisfy both encouraging conditions. While the encouraging conditions are rather strong, we show that they can be satisfied with some rational assumptions and a simple mechanism.

The first question is how to characterize the peer-assessed grades. We assume the grade to follow a normal distribution here, which is common when characterizing an observation.

The mean will simply be the intrinsic value as we assume unbiased grading. The variance depends on the amount of time. Intuitively, the more effort the grader puts in grading, the closer the grade will be to the intrinsic value. Furthermore, the marginal effect should be diminishing. Thus we assume the variance to be non-increasing and convex in the time spent on grading. Notice this does not mean a grader can become arbitrarily precise if he spends a very large amount of time on grading, since the above function may not be strictly decreasing. For the mechanism, we use an averaging method to aggregate the grades. We proved that, combined with any non-increasing piecewise continuous f_accu(·), this setting satisfies both encouraging conditions.

Definition 3.3. We define the averaging function to be f_avg(S) = _k¹∑

j(S_j).

Proposition 3.4. Suppose that for every j, Fj(·, v, tj) ∼ N(v, g²j) where g_j = g_j(t_j) is a non-increasing convex function. If f_agg(·) = favg(·), faccu(·) is non-increasing piece- wise continuous, then for any values of (M, k, r, U, λ), (M, k, r, U, λ, fg, fagg(·), faccu(·)) satisfies both EC-1 and EC-2.

(31)

To prove this proposition, we start by proving a weaker proposition where faccu(·) is limited to a threshold function, which means full mark is given once a learner gives a grade not too far off from the consensus. Then we extend the proposition to the general case.

Definition 3.4. We define the h-threshold awarding function to be fh(x) = 1 if x≤ h for some threshold value 0≤ h < ∞, and fh(x) = 0 otherwise.

Proposition 3.5. Suppose that for every j, Fj(·, v, tj)∼ N(v, g²j) where gj = gj(tj) is a non-increasing convex function. If f_agg(·) = favg(·), faccu(·) = fh(·) for some 0 ≤ h <

∞, then for any values of (M, k, r, U, λ), (M, k, r, U, λ, fg, f_agg(·), faccu(·)) satisfies both EC-1 and EC-2.

Proof. To simplify notations let g(·) = gj(·). Since Sj follows a normal distribution, S =ˆ ¹_kΣ_jS_j also follows a normal distribution. Consider player j. Let the distribution of other graders’ average score to be F_o = _k₋₁¹ ∑

i̸=j

S_i ∼ N(v, x²), then the aggregated score is given by

S = fˆ avg(S) = 1 k

∑

i

Si = 1

kFj +k− 1 k Fo,

the distance to the aggregated score is given by

Sj − ˆS = Fj− ˆS = k− 1

k (Fo− Fj)∼ N(0,k− 1

k (x²+ g²)),

and the expected accuracy is given by

E[α_j](T_−j, t) = E[f_h(|Sj − ˆS|)]

= P r[|Sj− ˆS| ≤ h]

= erf( h

√(k−1)(x²+g²) k

)

= erf( h√

√ k

(k− 1)(x² + g²)).

By definition, g^′(t_j)≤ 0, g^′′(t_j)≥ 0.

(32)

Let the effective error threshold p = √ ^h^√^k

(k−1)(x²+g²), then

∂p

∂t = h√

√ k k− 1

g(−g^′) (x²+ g²)³² ≥ 0

∂

∂tE[α_j](T_−j, t) = ∂

∂terf(p) = ( 2

√πe^−p²)∂p

∂t ≥ 0

∂²

∂t²E[α_j](T_−j, t) = ∂

∂t( 2

√πe^−p²)∂p

∂t

= ( 2

√πe^−p²)((−2p)∂p

∂t + h√

√ k

k− 1(−(g^′)²+ gg^′′

(x²+ g²)³² − 3g²(g^′)² (x²+ g²)⁵²))

≤ 0.

The last inequality follows from the fact that g^′(tj) is non-positive and g^′′(tj) is non- negative. Let T_−j and T^′_−j be two strategy profiles satisfying tp < t^′_p for some p, and t_q = t^′_q∀q ̸= p. Let Fo1 = _k₋₁¹ ∑

i̸=j

S_i ∼ N(v, x²1) and F_o2 = _k₋₁¹ ∑

i̸=j

S_i ∼ N(v, x²2) be respectively the other grader’s average score distribution in both cases. Also let the corresponding effective error thresholds be p₁ = √ ^h^√^k

(k−1)(x²1+g²) and p₂ = √ ^h^√^k

(k−1)(x²2+g²). Clearly, x₁ > x₂, which implies p₁ < p₂. Then we have

∂

∂t(E[αj](T_−j, t)− E[αj](T_−j^′ , t)) = ∂

∂t(erf(h1)− erf(h2))

= ( 2

√πe^−p²¹)∂p₁

∂t − ( 2

√πe^−p²²)∂p₂

∂t

= ( 2

√π

(k− 1)g(−g^′) h²k )(e^−p²¹

p³₁ − e^−p²² p³₂ )

≤ 0.

The first term, √² π

(k−1)g(−g^′)

h²k , is non-negative since g is a non-increasing non-negative function. Since _∂p^∂ ^e^−p2_p3 = −^e^−p2^(2p_p4²⁺³⁾ ≤ 0, the second term is an definite integral of a non-positive function, so it is non-positive. Hence the product of the two terms is non- positive.

Any non-increasing piecewise continuous function can be decomposed into combination of integrals of threshold functions. For each threshold function, the portion of expected utility given by it satisfies EC-1. The expected utility is now combination of

(33)

integrals of non-decreasing and concave functions, thus being non-decreasing and concave itself [2]. Similarly, in the condition in EC-2, the combination of integrals of error functions from the latter strategy profile always increases more rapidly then that from the former profile.

Again, all the grading distributions need not to be the same. We showed that as long as every grader’s distribution is a normal distribution with a variance non-increasing with his effort, both encouraging conditions will be satisfied.

(34)

(35)

Chapter 4 Homogeneous Grading

Along with the development of MOOCs, another form of online learning environments in the name of SPOCs has also emerged [10]. Small Private Online Courses has at most several hundreds of qualified learners. With an eye on better learning experiences, a SPOC may require learners to pass tests, complete prerequisites, or even require student status of the relating institutes, used as material in a blended-learning course. The diversity of learners is drastically decreased in this type of learning environments. While grading several hundreds of copies of assignments may not be impossible, it is still tedious and requires massive effort. Plus all its merits, peer grading can still be used in SPOCs.

Corresponding to this type of environment where the learners are more homogeneous than before, in this chapter we focus on a further limited case of our model, where the peer graders not only share identical grading abilities but also value their time identically. We show that all possible pure Nash equilibria in this scenario share a common property: a submission is graded with the same level of effort by every grader. While this is still an ideal scenario, we can learn from it what homogeneity brings to our model.

Definition 4.1. A homogeneous unbiased peer grading subgame, or HUPG subgame, is a UPG subgame satisfying the following properties:

• f₁ = f₂ = ... = f_k = f , where f is the shared grading probability distribution.

• r_i = r₂ = ... = r_k= r, where r is the shared time-to-grade ratio.

(36)

When both encouraging conditions are met in an HUPG subgame, we found that in equilibrium it is impossible for two learners to put in different levels of effort. Informally, in such an equilibrium the low-effort learner’s peers puts more effort in total than the high- effort one’s peers. This means the low-effort learner would like to put in more effort than the high-effort learner as well, which leads to a contradiction. Consequently, we found that a submission is graded with the same level of effort by every grader, as described in the following theorem.

Theorem 4.1. Suppose that EC-1 and EC-2 are both satisfied in an HUPG subgame. If T^∗ ={t^∗1, t^∗₂, ..., t^∗_k} is a pure Nash equilibrium, then t^∗1 = t^∗₂ = ... = t^∗_k = t^∗.

Proof. Assume the contrary. Then there must exist t^∗_i < t^∗_j for some i ̸= j. Considering the strategy profiles T_−iand T_−j, we have

∂

∂tE[α](T_−j, t) t^∗_i

≥ ∂

∂tE[α](T_−j, t) t^∗_j

(EC-1)

≥ kr λM

≥ ∂

∂tE[α](T_−i, t) t^∗_i

(Lemma 3.1)

> ∂

∂tE[α](T_−j, t) t^∗_i

, (EC-2)

which is a contradiction.

Combining with the previous result that a pure NE always exists, we immediately have the following corollary. However we give a direct proof here.

Corollary 4.1.1. In any HUPG subgame that satisfies both encouraging conditions, there exists at least one equilibrium where t^∗_i = t^∗,∀i.

Proof. Let T_−i(p) be a strategy profile with t_j = p∀j ̸= i. By applying the property EC-2, we have _∂t^∂E[α_i](T_−i(p), t) < _∂t^∂E[α_i](T_−i(q), t) iff p < q. Denote t^∗_i(p) to be player i’s

促進大型開放式線上課程中的同儕互評

國⽴臺灣⼤學電機資訊學院電機⼯程學系 碩⼠論⽂

Department of Electrical Engineering

College of Electrical Engineering and Computer Science

National Taiwan University Master Thesis

促進⼤型開放式線上課程中的同儕互評 Encouraging Peer Grading in MOOCs

柯劭珩 Shao-Heng Ko

指導教授：陳和麟博⼠

Advisor: Ho-Lin Chen, Ph.D.

中華⺠國 106 年 5 ⽉

May, 2017

誌謝

摘要

Abstract

Contents

Chapter 1 Introduction

Chapter 2 Model

2.1 Players and Actions

2.2 Grading Mechanism

2.3 Utility

2.4 Decomposition of Model

Chapter 3 Analysis

3.1 Encouraging Conditions and Existence of Equilibria

3.2 Encouraging Peer Grading

3.3 Settings That Meet the Encouraging Conditions

Chapter 4

Homogeneous Grading

國⽴臺灣⼤學電機資訊學院電機⼯程學系碩⼠論⽂