• 沒有找到結果。

促進大型開放式線上課程中的同儕互評

N/A
N/A
Protected

Academic year: 2022

Share "促進大型開放式線上課程中的同儕互評"

Copied!
51
0
0

加載中.... (立即查看全文)

全文

(1)

國⽴臺灣⼤學電機資訊學院電機⼯程學系 碩⼠論⽂

Department of Electrical Engineering

College of Electrical Engineering and Computer Science

National Taiwan University Master Thesis

促進⼤型開放式線上課程中的同儕互評 Encouraging Peer Grading in MOOCs

柯劭珩 Shao-Heng Ko

指導教授:陳和麟博⼠

Advisor: Ho-Lin Chen, Ph.D.

中華⺠國 106 年 5 ⽉

May, 2017

(2)
(3)
(4)
(5)

誌謝

在論⽂完成的此刻,⾸先要感謝指導教授陳和麟⽼師。有⼈說,普通的

⽼師是拉著學⽣往前⾛;⽽好的⽼師則是在⼀條做學問的路上先點起燈,讓 學⽣⾃⼰⾛,在摸索的過程中⾃⼰成⻑茁壯。然⽽,我的指導教授不屬於以 上任何⼀種。他會在任何時刻,為學⽣指引出前⽅每⼀條不同的路,其中許 多都有他探訪過留下的⾜跡。⽽當你居然選擇了⼀條他未曾去過的路,他也 會陪著你⾛⼀遭。在碩⼠班的兩年中,⽼師給了我⾮常⼤的彈性和信任,讓 我能將興趣和注意⼒集中在許多不同的事物上,最後能從⽣活中找到⼀個新 的研究題材;⽽在論⽂的發想、撰寫和修改時,總是精準的找到我的盲點。

接下來要感謝⼝試委員們:葉丙成⽼師、于天⽴⽼師、孔令傑⽼師。三位⽼

師在百忙中抽空審閱本篇論⽂、擔任⼝試委員,也都對本篇論⽂提出了⾮常 實⽤的建議。然⽽⽼師們所不知道的是,,在接受邀請之前,其實他們早已 對我的研究做出了深遠的影響。在⼤學⼆年級時,有緣成為葉⽼師前幾批翻 轉教學的對象,讓我體驗到了不同的學習樂趣和成就感,進⽽啟發我踏⼊

MOOCs 的世界;⽽于⽼師的⼈⼯智慧,則是我⼈⽣中第⼀堂透過 MOOCs 平 台修習的課程。在孔⽼師的資訊經濟課堂中,我學習了許多建模和分析的技 巧,開始思考直接從⽣活中尋找新問題的可能性。本篇論⽂的研究主題,是 我在台⼤ MOOCs 團隊實習過程中實際遭遇的問題,⽽⽼師們除了都和團隊 有密切的關係,各⾃開授的實體課程,也都是我⼤學和碩⼠班⽣涯中,最有 收穫的幾堂課。能夠邀請到⽼師們擔任我的⼝試委員,是⾮常圓滿的緣分。

感謝台⼤ MOOCs 團隊的教學設計師、實習⽣同仁們,兩年來無論在實務或 理論上的切磋討論、思辯交流;感謝我的⼥朋友曉亭,在這段期間不間斷的 陪伴,在咖啡館中⾒證主要成果的誕⽣;感謝我的⽗⺟、家⼈、朋友們的⽀

持與⿎勵,讓我能專注在學業之上。作為碩⼠論⽂,我相信這只是個開始,

我盼望並且期待,能將此⽣虛擲在思考有趣的事物之上。

(6)
(7)

摘要

在⼤型開放式線上課程 (MOOCs) 當中,由於學習者數量極為龐⼤,

⾼階學習表現通常只能透過同儕互評 (Peer Grading) 的⽅式來評量。在 MOOCs 中實施同儕互評時,學習者通常缺乏為其他⼈評分的動機,

因⽽沒有付出⾜夠的⼼⼒。為改善此現象,我們考慮讓學習者的成績 與其評量他⼈的準確度相關的機制,並建⽴相關的賽局理論模型,以 分析學習者在此機制下的理性⾏為。我們發現⼀組能保證純粹均衡存 在的條件,在此條件下,課程設計者將可透過調整機制參數的⽅式,

促進學習者投資更多的⼼⼒在評分之上。更進⼀步,若學習者之間具 有同質性時,我們證明在所有純粹均衡當中,所有為同⼀份作業評分 的學習者都會付出恰相等的時間。藉由這個性質,我們能夠計算所有 可能的純粹均衡點。我們將上述結果推廣到某些學習者並⾮採取理性 策略的狀況,並討論如何在實際情況中應⽤本研究的結果。

關鍵字:賽局理論,⼤型開放式線上課程,同儕互評,納許均衡,機 制設計

(8)
(9)

Abstract

Due to huge participant sizes in Massive Open Online Courses (MOOCs), peer grading is practically the only existing solution to grading high-level as- signments. One of the main issues of utilizing peer grading in MOOCs is that learners are not motivated and do not spend enough effort in grading. To mod- ify current peer grading mechanism to induce better grading, we focus on the idea of making the learners’ grade depend on the accuracy of their grading of others’ work. We built a game theoretical model to characterize the rational behavior of learners in such a mechanism. We found a set of conditions which guarantees existence of pure-strategy equilibria. When the conditions are sat- isfied, the course designer can encourage the learners to spend more time on grading through tuning the mechanism parameters. Furthermore, when the learners are assumed to be homogeneous, we proved that in any pure equilib- rium, any submitted work will be graded with identical effort by the relevant graders. With this property all the possible pure equilibria are theoretically calculable. We also extended our result to the case where some of the learn- ers are not strategic or rational. We discussed applications of our results in practical situations.

Keywords. Game Theory, Massive Open Online Courses, Peer Grading, Nash Equilibrium, Mechanism Design

(10)
(11)

Contents

誌謝 v

摘要 vii

Abstract ix

1 Introduction 1

2 Model 7

2.1 Players and Actions . . . 7

2.2 Grading Mechanism . . . 8

2.3 Utility . . . 9

2.4 Decomposition of Model . . . 10

3 Analysis 13 3.1 Encouraging Conditions and Existence of Equilibria . . . 13

3.2 Encouraging Peer Grading . . . 16

3.3 Settings That Meet the Encouraging Conditions . . . 18

4 Homogeneous Grading 23 5 Peer Grading with Irrational Players 27 6 Discussion and Future Work 31 6.1 Other Game Settings . . . 31

6.2 Setting Up the Parameters in Practice . . . 31

(12)

6.3 Average Amount of Graders . . . 32

6.4 Unbalanced Peer Grading Tasks . . . 33

6.5 Towards Biased Grading . . . 34

6.6 Concensus Grading . . . 34

Bibliography 37

(13)

Chapter 1 Introduction

Online learning is becoming a flourishing industry. In addition to the long-standing Open- CourseWares, learners today have MOOCs, or Massive Open Online Courses, to gain knowledge from. Unlike OpenCourseWares, which simply releases the course material, MOOCs are “online classes that anyone, anywhere can participate in” [13]. Furthermore, current MOOCs are trending toward formality; they are charging certification fees, offer- ing institutional credits and even online programs. Consequently, the task of performance assessment is becoming more and more important.

It is common for a MOOC to have thousands of participants, in which the load of evalu- ating the grades is well beyond any course staff can afford. While automated grading tech- niques can easily handle multiple choices and programming tasks, they are nearly useless when it comes to grading more sophisticated assignments, like mathematical proofs, art work, written pieces and speeches. Peer grading is practically the only existing solution to grading high-level assignments.

The concept of peer grading comes from traditional pedagogy, where its effects were well-studied [1, 8, 9, 20, 21]. In the context of MOOCs, after any learner submits an as- signment, it will be graded by several peers, and the learner himself is required to grade the others’ assignments as well. The final score of an assignment is determined by some aggregation of all the scores given to it. Past work [4, 9, 15] has shown that such aggre- gation is decently close to the evaluation from the instructor even with a small number of graders grading each assignment. Peer grading also has the effect of deepening learner’s

(14)

comprehension [19], building positive learning environment [21], and even metacognitive benefits [8].

However, there are problems to be addressed when peer grading is applied in MOOCs.

Learners in MOOCs, who come from all around the world and all backgrounds, feature great diversity. Empirical study shows that peer grading has weaker reliability in MOOCs than in real classes [17], and not all learners are satisfied with the mechanism [4,17]. This leads to a main question:

How can we modify the peer grading mechanism to induce better grading precision?

The work of Piech et al. [18], with an eye on tuning the mechanism used by Coursera, investigated various probabilistic models of peer grading. A part of their work assigns weights to graders by past performance, exploiting the assumption that more adequate grading is correlated with higher grades. While the assumption is still up for debate [7], Piech et al. claim that peer grading accuracy can be improved simply by measuring the bias and reliability of graders. However, there remains an unaddressed issue: graders in practice are not spending enough time on peer grading, possibly due to lacking motiva- tion. Thus, Piech et al. called for game theoretical research on mechanisms to incentivize learners to put more effort into grading.

Following the above work, there have been multiple work on incentivizing the learners, by both rewards and punishments. On the rewards side, de Alfaro and Shavlovsky [6]

implemented a system named Crowdgrader that lets learners collaboratively review and grade assignments. In this system, the overall grade of a learner depends both on the aggregate grade received, and their “precision” in reviewing their peer’s work, which is determined by the average error with respect to the consensus grade. While this ever- evolving platform is primarily used in college classes, the designers reported that “the number of students who complained about mis-gradings was about the same as the one we typically experience using TAs” [6].

On the punishments side, Carbonara et al. [3] tackled the problem using an audit game

(15)

approach, in which learners are penalized if they are caught misgrading the others’ work.

They studied the problem of allocating limited auditing resources, like TA hours, to het- erogeneous learners. Under the assumption that all learners spend a fixed amount of total time in doing the assignment and peer grading, they showed an algorithm to obtain an approximately optimal allocation. However, this assumption is also the biggest caveat of their model, as it is better to motivate learners to spend more time on learning.

Finally, Lu et al. [16] designed a large experiment to motivate peer graders by letting their grading performance to be examined as well, but without any rewards or punishments as consequence. While this alone seemed not to do the trick, they found that learners improved their grading accuracy by only evaluating the others’ grading performances.

They proposed the possibility of motivating the graders by other incentives instead of grades.

Motivated by all the above work, and the fact that most of the work about peer grading are empirical studies, we aim to investigate the behavior of peer graders under the rewards approach from a game theoretical perspective. By building a model of the peer grading mechanism, we ask the following main questions:

• What will be the rational behaviors of graders?

• Under what conditions will such mechanisms work in line with our expectations?

• What can the course provider do to affect the graders’ behaviors?

The overall structure of our model is similar to the design in [6], in which a consensus grade of an assignment directly comes from aggregation of the graders’ opinions. An evaluation of one grader’s accuracy is then measured by how far away his opinions are from the related consensus grades. The final utility of a learner is realized as a linear combination of his assignment score and his grading accuracy. Instead of strategically deciding how much points they give to a particular assignment, we assume that each grader only decides the amount of time he puts into every grading task, which will not be observed

(16)

by the course provider. Unlike that in [3], we do not assume a tradeoff between time put in doing assignment and in peer grading, as these two phases often have different time periods in common MOOCs settings.

Though we mainly think of the utility from precise grading as given grades, our model can capture other types of rewards like fame, self-fulfillment, or even monetary rewards and career opportunities. While [6] measures the grader’s overall accuracy, we take into account every grading attempt rather separately, broadening the possible strategies for each grader. Generally, we assume the peer graders are heterogeneous, both in their grading ability and their evaluation of time, which captures the feature of a MOOCs en- vironment.

Similar to that in [18], we assume that the grade given by a peer grader to each as- signment is distributed around the true value of the assignment. Naturally, the degree of dispersion depends on the effort in grading. We do not assume a peer grader is capable of doing precise grading even with unlimited effort. We further assume that the graders do unbiased grading, which means they overgrade and undergrade equally often. While peer graders tend to bias toward high grades in reality [11, 15], this effect can be neutralized by pretreatments after measuring the bias.

The main result of this paper is a set of sufficient conditions, called the encouraging conditions, which guarantees the existence of pure Nash equilibria. Once the condition is met, the course designer can encourage or discourage the graders to spend more time on grading by tuning the mechanism parameters. We also proved that the encouraging con- ditions are satisfied in a series of exact situations, where the grade given by a peer grader to an assignment follows a normal distribution with the intrinsic value of the assignment being the mean and the variance being a non-increasing and convex function of the time spent.

Later, we focus on a special environment where all learners are homogeneous. This setting can be related to SPOCs, or Small Private Online Courses, where “MOOCs are used as a supplement to classroom teaching” [10]. We found that under this assumption, every pure Nash equilibrium has the property that each assignment is graded by the same

(17)

level of effort by all the relevant graders. On the other hand, we pointed out that the pattern of equilibrium behaviors is still valid in the situation when some of the graders are assumed to be irrational players with perdetermined strategies.

The general model we use is introduced in Section 2, where we promptly simplify it into more compact models. Our main result lies in Section 3, including the encouraging conditions, the consequent existence of pure equilibria, and how the mechanism designer encourages peer grading by tuning the parameters. We also propose a series of practi- cal settings that satisfy the encouraging conditions. In Section 4, we present the further limited homogeneous model and obtain stronger results. In Section 5 we extend our anal- ysis to include irrational graders. Finally, discussions toward biased grading and practical implications are in Section 6.

(18)
(19)

Chapter 2 Model

In this chapter, we propose a model which will be used throughout the paper. We charac- terize those assignments that can be graded without subjectivity by assuming an intrinsic value for each submitted work. Intuitively, the more time a learner spent on grading a submission, the closer his grading will be to its intrinsic value. Our model is mainly built on such assumption, while the overall structure is similar to the common mechanism used in MOOC platforms.

2.1 Players and Actions

We assume N learners, {a1, a2, ...aN}, working on some assignment in a MOOC, have already finished their submissions and are entering the peer-grading phase. The submitted work of ai has intrinsic value vi ∈ [0, M], where M is the maximum score of this assign- ment. The grading task is fully described by G, an N by N matrix with boolean elements.

G(i,j) equals to 1 if aj is asked to peer grade the submitted work of ai, and 0 otherwise.

Each learner is asked to grade exactly k submissions, and each submission is graded by exactly k learners; hence,

iG(i,j)=∑

iG(j,i) = k. Furthermore, no learner self-grades his submitted work; hence G(i,i) = 0. We assume that all possible grading relations are chosen equally likely beyond the learners’ knowledge. Thus the learners cannot inference any information about who is grading their submissions or vice versa.

(20)

The vector Tj = [t1j, t2j, ...tNj ] is the strategy of learner aj. tij ∈ [0, U] is the amount of time he puts on grading the submission of ai, where U is assumed to be the upper bound limit. This upper bound is natural most of the time; no player will spend longer time if the corresponding cost is larger than the maximum reward possible. While such limit does not exist in practice, peer grading cannot take forever long, and there will be a stop-loss point such that keep increasing the time will not do any good. In reality, a grader only determines how much of time he puts on each submission he receives. Therefore, tij = 0 if G(i,j)= 0. The course provider cannot observe any learner’s strategies.

2.2 Grading Mechanism

In this model, we assume that a peer grader does not strategically give grades. Instead, the grades he gives are random variables, following distributions decided by the amount of time (effort) he puts on each submitted work, and its intrinsic value. To characterize this, we denote fj(·) to be the grading possibility function of grader j, where fj(x, v, t) refers to the probability that grader j gives x points to an assignment with intrinsic value v, after spending t units of time on grading it. Thus, given fixed v, t, fj(x, v, t) is a probability density function, and corresponds to a cumulative density function Fj(x, v, t). Clearly, fj(x, v, t) = 0∀j, ∀ x /∈ [0, M]. Let Sji be the score aj gives to the submitted work of ai. If G(i,j) = 1, then Sji ∼ Fj(·, vi, tj). Furthermore, we assume unbiased grading: given v, t, fj(v−x, v, t) = fj(v + x, v, t), ∀x ∈ R. To simplify notations, we use fgto represent the functions f1(·, ·, ·), f2(·, ·, ·), ...fN(·, ·, ·).

Denote Si = [Sji|G(i,j) = 1] to be the vector of grades given to the submission of ai. The aggregate score, or consensus grade, of ai’s submission, is then determined by Sˆi = fagg(Si). Naturally, fagg(Si) = fagg(Si) if Si is a permutation of Si; that is, the consensus grade should not depend on the order of the peer grades given. Furthermore, fagg(·) should be non-decreasing in every element in Si. There exist various methods in aggregating peer grades; for example, the Crowdgrader platform in [6] uses an Olympian average function, where the highest and the lowest grades are dropped before taking

(21)

average. Coursera chooses a median function instead [5]. Thus we do not specify an exact method here. Since we require fj(·, vi, t) to be symmetric with respect to vi for any t, given any strategy profile, the derived probability distribution of ˆSi will also be symmetric with respect to vi.

Next, the accuracy level of aj grading the submission of ai is determined by αij = faccu(|Sji − ˆSi|) ∈ [0, 1], while faccu(·) must be non-increasing. Similarly to above, we only require that the function solely depends on the difference between the single grade and the aggregated grade. Finally, the average peer grading accuracy of aj is determined by ˆαj = 1k

iij). Clearly, since αij ∈ [0, 1], ˆαj ∈ [0, 1].

2.3 Utility

We assume that grader aihas time-to-grade ratio ri ≥ 0, which means he is willing to give up ri units of time to earn one point in his grade. All values of ri are public information.

We then define the time-to-grade ratio vector r = [ri]. Also, we define λ to be the portion grading performance accounts for in one’s final utility. This means a learner can earn up to λM points by peer grading, if we take into account only rewards in grades. The final utility of learner ai is defined to be πi = λM ˆαi− ri

jtji. All learners are risk-neutral.

Note that ai should get (1− λ) ˆSi points from his own work as well. However, this part is independent to any of his strategic decision. Hence we can remove this term from the utility in our model. We can define one specific game model by defining all the above parameters, functions and probablistic distributions.

Definition 2.1. An unbiased peer grading game, or UPG game, is a tuple G =(

N, k, M, r, U, λ, fg, fagg(·), faccu(·)) .

Note that all fj’s need not to be the same. Hence, the graders’ actions, and conse- quently the perceived score distributions are generally heterogeneous. Nowadays, it is common for MOOCs to utilize analytic rubrics in peer grading assignments, leading to more objective and systematic grading results; however, complete objectivity is impossi-

(22)

ble to be achieved. Also, how learners value their time is generally out of the instructor’s control.

Parameters and functions M, k, λ, fagg(·), faccu(·) are chosen by the mechanism de- signer. These information, N, r and fg and the whole structure of the mechanism are public information to all learners, while the intrinsic value vector V = {vi} and the re- lation matrix G are unobserved. Since the learners are risk-neutral, learner ai rationally determines Ti in order to maximize E[πi]. Note that Ti will not be observed by the de- signer and will not be directed used to determine the reward; the designer can only decide the parameters and functions above, seeking to induce better overall effort and/or grading accuracy.

2.4 Decomposition of Model

First and foremost, a simplification can be directly made from our model. We can observe that an UPG game is indeed a series of independent smaller subgames, each containing only one submitted work of assignment. This is described in the theorem below.

Definition 2.2. Given the other players’ strategy profile, T−j ={T1, ...Tj−1, Tj+1, ..., TN}, we define learner aj’s best response Tjto be the optimal choice of Tjthat maximizes E[πj], conditioned on T−j.

Theorem 2.1. Given the other players’ two strategy profiles, T−j, T−j , the corresponding best responses tj and t′∗j satisfies the following: t∗ij = t′∗ij, if tij = t′ij,∀j ̸= j.

Proof. Consider player j. We first observe that his total expected utility can be decom- posed into

E[πj] =∑

i

(λM

k E[αij]− rjtij).

Therefore, maximizing the expected utility is equivalent to maximizing the total of the k nonzero terms, each corresponding to the expected utility from grading one specific submitted work. Given the other players’ strategies, fixing i, the term λMk E[αij]− rjtij

(23)

only depends on Ti = [tij|G(i, j) = 1]. Thus, maximizing the total of all k terms is equivalent to simultaneously maximizing k terms independently.

If tij = t′ij,∀j ̸= j holds, then the optimal choice of tij, which maximizes λMk E[αij] rjtij, should also maximize λMk E[α′ij]− rjt′ij.

Equivalently, the general UPG game can be decomposed into N subgames, each con- taining one submitted work of assignment and k relevant peer graders. The equilibrium of a general UPG game, described by all learner’s effort on all submitted works they grade, is then composed of their respective effort in all subgames. Here we emphasize that there exists no total time limit for a learner, or equivalently, the maximum time possible to put in, which corresponds to the length of the peer grading phase, is much longer than kU , which is the maximum total time spent on grading. Thus, putting in more time grading one submission does not affect the grading of other submissions. With the help of this property, we can separate them and analyze one subgame at a time. For the rest of the paper, we define such subgame as follows to simplify the notations.

Definition 2.3. In a UPG subgame, only k learners are considered; each of them grades the same submission with value v. tj represents the amount of effort learner j puts in grading the submitted work, dropping the superscript from tij in the general game. Sj, ˆS and αj are defined analogously, inheriting the meaning of the counterparts with superscript. πj = E[λMk αj − rjtj] is now the utility learner aj gets directly from his peer grading effort in this subgame.

Note that N becomes irrelevant once we separate a UPG game into subgames, and we can fully describe such a subgame by specifying (M, k, r, U, λ, fg, fagg(·), faccu(·)). We call this tuple a setting of a UPG subgame. Note that here fgrepresents only k functions.

(24)
(25)

Chapter 3 Analysis

3.1 Encouraging Conditions and Existence of Equilibria

In this section, we define the following encouraging conditions and prove that these con- ditions lead to the existence of pure Nash equilibria in a UPG subgame.

Definition 3.1. A setting of a UPG subgame (M, k, r, U, λ, fg, fagg(·), faccu(·)) satisfies the Encouraging Conditions if:

• (EC-1)

Given the other players’ strategies T−j = [t1, t2, ..., tj−1, tj+1, ..., tk], E[αj](T−j, tj) is non-decreasing and strictly concave on tj ∈ [0, U].

• (EC-2)

Let T−j and T−j be two strategy profiles satisfying the following properties:

– tp < tp for some p, – tq = tq∀ q ̸= p,

Then ∂t

jE[αj](T−j, tj) < ∂t

jE[αj](T−j , tj),∀tj.

Intuitively, EC-1 says that if a grader puts more effort in grading, his accuracy im- proves, with a diminishing marginal effect. The following lemma says that, combining with a linear cost on time, EC-1 makes the decision problem for the grader straightforward.

(26)

Lemma 3.1. If a setting of a UPG subgame satisfies EC-1, then given any T−j, πj(T−j, tj) is strictly concave on tj ∈ [0, U], and the best response tj is unique.

Also, either ∂t

jE[αj](T−j, tj)

tj

= λMkrj, or tj ∈ {0, U}.

Proof.

2j)

∂(tj)2 = 2

∂(tj)2 λM

k E[αj](T−j, tj) 2(rtj)

∂(tj)2

= λM k

2

∂(tj)2E[αj](T−j, tj)

< 0,

which gives the first statement.

The last inequality is from the concavity of E[αj], required in EC-1. Since the partial utility function is concave, its global maximum either lies on the boundary or satisfies the first-order condition

∂(πj)

∂(tj)

tj

=

∂(tj) λM

k E[αj](T−j, tj)− rjtj tj

= λM k

∂(tj)E[αj](T−j, tj) tj

− rj

= 0,

which gives the second statement.

Note that in the boundary cases,

∂t

jE[αj](T−j, tj) > λMkrj,∀tj ∈ [0, U], if tj = U .

∂t

jE[αj](T−j, tj) < λMkrj,∀tj ∈ [0, U], if tj = 0.

On the other hand, EC-2 states that, an arbitrary grader unilaterally increasing his own grading effort will increase the marginal utilities for all other graders. Intuitively, although without specifications, we generally think of the grading possibility functions to be distributed closer to the intrinsic value if the effort is increased. Therefore, unilaterally increasing effort will make the concensus grade also distributed closer to the intrinsic

(27)

value. This effect is generally good for graders since they can only invest effort to stand closer to the intrinsic value, instead of the concensus. If this effect can encourage graders to invest more effort, then we can expect a positive reinforcement process. The following theorem shows that pure Nash equilibria always exist if both conditions are satisfied. The positive reinforcement process can be seen in the proof.

Theorem 3.2. In any UPG subgame that satisfies both encouraging conditions, there exists at least one pure Nash equilibrium.

Proof. We prove the existence of pure Nash equilibria by describing a virtual algorithm that “calculates” one.

Given the setting of the UPG subgame, initially we let Ti = 0∀i, or equivalently no grader puts in any kind of effort. We then modify the effort levels one by one in the order of i if an equilibrium is not reached yet. When modifying Ti, we fix all other effort levels T−i, and move Tito the best response of grader i, with respect to T−i.

Trivially, since all effort levels are initialized to be zero, Tieither stays at zero or is raised upwards when it is modified for the first time. Also by EC-2, whenever one of the effort levels is raised upwards, all marginal utilities for all other graders will be increased.

Suppose the effort level of grader j, now temporarily set to Tj, is being modified for the second or more time, and no effort levels were decreased in the previous k− 1 modifica- tions.

By the previous lemma, if the marginal utility, ∂(t

j)E[αj](T−j, tj) λMkrj, is negative in the whole interval [0, U ], then the best response for grader j will remain at zero regard- less of all others’ strategies. Suppose not, then in the previous round, Tj is set to make the marginal utility at zero. By the effects of EC-2, the marginal utility is now non- negative after a whole round of modifications. Suppose in the current round the other graders’ strategies are T−j, and Tj is about to be modified to Tj. From EC-1, since

∂(tj)E[αj](T−j , tj) = λMkrj ∂(tj)E[αj](T−j, tj), we have Tj ≥ Tj. Combined with the fact that all effort levels can only increase in the first round, by induction we can prove that all effort levels are never decreased.

Since the effort levels cannot exceed U , our algorithm will eventually converge to a stable

(28)

state when k consecutive effort levels are not modified in their turns, which gives a pure Nash equilibria. However, this algorithm does not converge in any guaranteed time limit, so it has little use of calculating the equilibria in practice.

3.2 Encouraging Peer Grading

In the previous section we have already seen the ratio λMkri, a grader’s marginal accuracy reward in equilibrium. Any marginal accuracy reward at least this large will justify putting in more time, hence it is the effective time-to-grade ratio. While riis an exogenous param- eter, k, M and λ is specified by the mechanism designer. Thus, the designer can increase or decrease this ratio to any arbitrary extent. Below we describe its effect on the possible equilibria.

Definition 3.2. The effective time-to-grade ratio for grader i in a UPG subgame is

¯

ri = λMkri = ¯rri, where ¯r = λMk .

Theorem 3.3. Assume k is fixed. Suppose both EC-1 and EC-2 are satisfied in a UPG subgame G1 with ¯r = ¯r1, and G2, G3 are UPG subgames that differs with G1 only in

¯

r2 > ¯r1 > ¯r3. If T1 = [t1i] is an equilibrium in G1, then

• There exists an equilibrium T2 in G2 where t2i ≤ t1i∀i.

If the i-th equality holds, then t1i = 0.

• There exists an equilibrium T3 in G3 where t3i ≥ t1i∀i.

If the i-th equality holds, then t1i = U .

Proof. We first prove the first statement and assume that ti > 0∀ i.

For convenience, we denote δi(X−i, xi) = ∂t

iE[αi](T−i, ti)

T−i=X−i, ti=xi

to be the slope of expected reward of player i, when player i spends xi and the overall strategy profile of all other players is X−i. We know that δi(X) is nonincreasing on xi (which comes from EC-1) and increasing on every element in X−i (which comes from EC-2), and that δi((T1)−i, t1i) = ¯r1ri∀i.

Suppose δi(0, 0) ≤ ¯r2ri∀i, then δi(0, y) ≤ δi(0, 0) ≤ ¯r2ri∀i, which means T = 0 is

(29)

an equilibrium in G2. Otherwise there exists j such that δj(0, 0) > ¯r2rj, which means there exists z > 0 such that δj(0, z) = ¯r2rj. Since ¯r2 > ¯r1, there exists y < t1j such that δj((T1)−j, y) = ¯r2rj. Consequently, δj(0, y) < δj((T1)−j, y) = ¯r2rj, so z < y. Let Br be the best response function in G2. Br is continuous, and we have that 0 < z = Br(0) < y = Br((T1)−j) < t1j.

Though the best response function takes k− 1 arguments, we now limit the degree of freedom to 1, by restricting the input vector to be parallel to (T1)−j. Let (βT1)−j be the strategy profile of all graders except j that satisfies (βt1)i = Uβt1i∀i. Denote ˜Br(β) = Br((βT1)−j), then we have 0 < z = ˜Br(0) < y = ˜Br(U ) < t1j ≤ U. By the fixed- point theorem there exists at least one w ∈ (z, y) s.t. ˜Br(w) = w. This means there exists an equilibrium in G2where tj = w < t1j and ti = Uwt1i < t1i∀i ̸= j.

By the same method we can prove the equality cases. Furthermore, the second statement can be proved analogously by comparing δi(U, U ) to ¯r3ri.

Equivalently, increasing the effective time-to-grade ratio distorts the equilibrium down- wards, and vice versa; extreme equilibria are preserved if the environment is made even more extreme. Suppose all functions are smooth, the equilibria will move continuously with the fluctuation of ¯r. In fact, ¯r is what the mechanism designer can really control. De- creasing λ or M all leads to an increase of ¯r, which “discourages” grading behavior and moves all the equilibria downwards; while increasing either λ or M “encourages” grading behavior, and moves all the equilibria upwards. Certainly, in practice λ and M cannot be raised without limitation, which will be discussed in Section 6.

The above implications may seem trivial at a first glance; learners invest effort and should get reward from that. However, we should point out again that our mechanism works under a subtle limitation that instructors cannot perceive and evaluate the learners’

effort, but only the grading outcomes. Therefore whole mechanism employs a “reward comes from outcomes” method. We have shown that, this method is as effective as an ideal “reward comes from effort” scenario, and the idea of encouraging learners only by rewarding their closeness to concensus is, game-theoretically, indeed effective.

Last, we can argue that all the above results in Section 3.1 and 3.2 will still hold even if

(30)

the expected accuracy level mentioned in EC-1 is only weakly concave. While this means that graders generally have multiple best responses to choose from, choosing the highest effort level among those best responses does not violate rationality. Therefore the positive reinforcement still exists, and all the other results follow. The only difference is that there will be far more possible equilibria.

3.3 Settings That Meet the Encouraging Conditions

So far we still do not know under what settings of a UPG subgame that the encouraging conditions will hold. In this section, we describe a series of practical settings of a UPG subgame that satisfy both encouraging conditions. While the encouraging conditions are rather strong, we show that they can be satisfied with some rational assumptions and a simple mechanism.

The first question is how to characterize the peer-assessed grades. We assume the grade to follow a normal distribution here, which is common when characterizing an observation.

The mean will simply be the intrinsic value as we assume unbiased grading. The variance depends on the amount of time. Intuitively, the more effort the grader puts in grading, the closer the grade will be to the intrinsic value. Furthermore, the marginal effect should be diminishing. Thus we assume the variance to be non-increasing and convex in the time spent on grading. Notice this does not mean a grader can become arbitrarily precise if he spends a very large amount of time on grading, since the above function may not be strictly decreasing. For the mechanism, we use an averaging method to aggregate the grades. We proved that, combined with any non-increasing piecewise continuous faccu(·), this setting satisfies both encouraging conditions.

Definition 3.3. We define the averaging function to be favg(S) = k1

j(Sj).

Proposition 3.4. Suppose that for every j, Fj(·, v, tj) ∼ N(v, g2j) where gj = gj(tj) is a non-increasing convex function. If fagg(·) = favg(·), faccu(·) is non-increasing piece- wise continuous, then for any values of (M, k, r, U, λ), (M, k, r, U, λ, fg, fagg(·), faccu(·)) satisfies both EC-1 and EC-2.

(31)

To prove this proposition, we start by proving a weaker proposition where faccu(·) is limited to a threshold function, which means full mark is given once a learner gives a grade not too far off from the consensus. Then we extend the proposition to the general case.

Definition 3.4. We define the h-threshold awarding function to be fh(x) = 1 if x≤ h for some threshold value 0≤ h < ∞, and fh(x) = 0 otherwise.

Proposition 3.5. Suppose that for every j, Fj(·, v, tj)∼ N(v, g2j) where gj = gj(tj) is a non-increasing convex function. If fagg(·) = favg(·), faccu(·) = fh(·) for some 0 ≤ h <

∞, then for any values of (M, k, r, U, λ), (M, k, r, U, λ, fg, fagg(·), faccu(·)) satisfies both EC-1 and EC-2.

Proof. To simplify notations let g(·) = gj(·). Since Sj follows a normal distribution, S =ˆ 1kΣjSj also follows a normal distribution. Consider player j. Let the distribution of other graders’ average score to be Fo = k−11

i̸=j

Si ∼ N(v, x2), then the aggregated score is given by

S = fˆ avg(S) = 1 k

i

Si = 1

kFj +k− 1 k Fo,

the distance to the aggregated score is given by

Sj − ˆS = Fj− ˆS = k− 1

k (Fo− Fj)∼ N(0,k− 1

k (x2+ g2)),

and the expected accuracy is given by

E[αj](T−j, t) = E[fh(|Sj − ˆS|)]

= P r[|Sj− ˆS| ≤ h]

= erf( h

(k−1)(x2+g2) k

)

= erf( h√

k

(k− 1)(x2 + g2)).

By definition, g(tj)≤ 0, g′′(tj)≥ 0.

(32)

Let the effective error threshold p = hk

(k−1)(x2+g2), then

∂p

∂t = h√

k k− 1

g(−g) (x2+ g2)32 ≥ 0

∂tE[αj](T−j, t) =

∂terf(p) = ( 2

√πe−p2)∂p

∂t ≥ 0

2

∂t2E[αj](T−j, t) =

∂t( 2

√πe−p2)∂p

∂t

= ( 2

√πe−p2)((−2p)∂p

∂t + h√

k

k− 1(−(g)2+ gg′′

(x2+ g2)32 3g2(g)2 (x2+ g2)52))

≤ 0.

The last inequality follows from the fact that g(tj) is non-positive and g′′(tj) is non- negative. Let T−j and T−j be two strategy profiles satisfying tp < tp for some p, and tq = tq∀q ̸= p. Let Fo1 = k−11

i̸=j

Si ∼ N(v, x21) and Fo2 = k−11

i̸=j

Si ∼ N(v, x22) be respectively the other grader’s average score distribution in both cases. Also let the corresponding effective error thresholds be p1 = hk

(k−1)(x21+g2) and p2 = hk

(k−1)(x22+g2). Clearly, x1 > x2, which implies p1 < p2. Then we have

∂t(E[αj](T−j, t)− E[αj](T−j , t)) =

∂t(erf(h1)− erf(h2))

= ( 2

√πe−p21)∂p1

∂t − ( 2

√πe−p22)∂p2

∂t

= ( 2

√π

(k− 1)g(−g) h2k )(e−p21

p31 e−p22 p32 )

≤ 0.

The first term, 2 π

(k−1)g(−g)

h2k , is non-negative since g is a non-increasing non-negative function. Since ∂p e−p2p3 = e−p2(2pp42+3) ≤ 0, the second term is an definite integral of a non-positive function, so it is non-positive. Hence the product of the two terms is non- positive.

Any non-increasing piecewise continuous function can be decomposed into combi- nation of integrals of threshold functions. For each threshold function, the portion of expected utility given by it satisfies EC-1. The expected utility is now combination of

(33)

integrals of non-decreasing and concave functions, thus being non-decreasing and con- cave itself [2]. Similarly, in the condition in EC-2, the combination of integrals of error functions from the latter strategy profile always increases more rapidly then that from the former profile.

Again, all the grading distributions need not to be the same. We showed that as long as every grader’s distribution is a normal distribution with a variance non-increasing with his effort, both encouraging conditions will be satisfied.

(34)
(35)

Chapter 4

Homogeneous Grading

Along with the development of MOOCs, another form of online learning environments in the name of SPOCs has also emerged [10]. Small Private Online Courses has at most several hundreds of qualified learners. With an eye on better learning experiences, a SPOC may require learners to pass tests, complete prerequisites, or even require student status of the relating institutes, used as material in a blended-learning course. The diversity of learners is drastically decreased in this type of learning environments. While grading several hundreds of copies of assignments may not be impossible, it is still tedious and requires massive effort. Plus all its merits, peer grading can still be used in SPOCs.

Corresponding to this type of environment where the learners are more homogeneous than before, in this chapter we focus on a further limited case of our model, where the peer graders not only share identical grading abilities but also value their time identically. We show that all possible pure Nash equilibria in this scenario share a common property: a submission is graded with the same level of effort by every grader. While this is still an ideal scenario, we can learn from it what homogeneity brings to our model.

Definition 4.1. A homogeneous unbiased peer grading subgame, or HUPG subgame, is a UPG subgame satisfying the following properties:

• f1 = f2 = ... = fk = f , where f is the shared grading probability distribution.

• ri = r2 = ... = rk= r, where r is the shared time-to-grade ratio.

(36)

When both encouraging conditions are met in an HUPG subgame, we found that in equilibrium it is impossible for two learners to put in different levels of effort. Informally, in such an equilibrium the low-effort learner’s peers puts more effort in total than the high- effort one’s peers. This means the low-effort learner would like to put in more effort than the high-effort learner as well, which leads to a contradiction. Consequently, we found that a submission is graded with the same level of effort by every grader, as described in the following theorem.

Theorem 4.1. Suppose that EC-1 and EC-2 are both satisfied in an HUPG subgame. If T ={t1, t2, ..., tk} is a pure Nash equilibrium, then t1 = t2 = ... = tk = t.

Proof. Assume the contrary. Then there must exist ti < tj for some i ̸= j. Considering the strategy profiles T−iand T−j, we have

∂tE[α](T−j, t) ti

∂tE[α](T−j, t) tj

(EC-1)

kr λM

∂tE[α](T−i, t) ti

(Lemma 3.1)

>

∂tE[α](T−j, t) ti

, (EC-2)

which is a contradiction.

Combining with the previous result that a pure NE always exists, we immediately have the following corollary. However we give a direct proof here.

Corollary 4.1.1. In any HUPG subgame that satisfies both encouraging conditions, there exists at least one equilibrium where ti = t,∀i.

Proof. Let T−i(p) be a strategy profile with tj = p∀j ̸= i. By applying the property EC-2, we have ∂tE[αi](T−i(p), t) < ∂tE[αi](T−i(q), t) iff p < q. Denote ti(p) to be player i’s

參考文獻

相關文件

First, in the Intel documentation, the encoding of the MOV instruction that moves an immediate word into a register is B8 +rw dw, where +rw indicates that a register code (0-7) is to

circuit sat: Given a circuit, is there a truth assignment such that the circuit outputs true?. • circuit sat ∈ NP: Guess a truth assignment and then evaluate the

circuit sat: Given a circuit, is there a truth assignment such that the circuit outputs true?.. • circuit sat ∈ NP: Guess a truth assignment and then evaluate the

circuit sat: Given a circuit, is there a truth assignment such that the circuit outputs true?.. • circuit sat ∈ NP: Guess a truth assignment and then evaluate the

circuit sat: Given a circuit, is there a truth assignment such that the circuit outputs truea. • circuit sat ∈ NP: Guess a truth assignment and then evaluate

In taking up the study of disease, you leave the exact and certain for the inexact and doubtful and enter a realm in which to a great extent the certainties are replaced

In this project, we discovered a way to make a triangle similar to a target triangle that can be inscribed in any given triangle. Then we found that every triangle we’ve made in a

If the bootstrap distribution of a statistic shows a normal shape and small bias, we can get a confidence interval for the parameter by using the boot- strap standard error and