• 沒有找到結果。

Decomposition of Model

First and foremost, a simplification can be directly made from our model. We can observe that an UPG game is indeed a series of independent smaller subgames, each containing only one submitted work of assignment. This is described in the theorem below.

Definition 2.2. Given the other players’ strategy profile, T−j ={T1, ...Tj−1, Tj+1, ..., TN}, we define learner aj’s best response Tjto be the optimal choice of Tjthat maximizes E[πj], conditioned on T−j.

Theorem 2.1. Given the other players’ two strategy profiles, T−j, T−j , the corresponding best responses tj and t′∗j satisfies the following: t∗ij = t′∗ij, if tij = t′ij,∀j ̸= j.

Proof. Consider player j. We first observe that his total expected utility can be decom-posed into

E[πj] =∑

i

(λM

k E[αij]− rjtij).

Therefore, maximizing the expected utility is equivalent to maximizing the total of the k nonzero terms, each corresponding to the expected utility from grading one specific submitted work. Given the other players’ strategies, fixing i, the term λMk E[αij]− rjtij

only depends on Ti = [tij|G(i, j) = 1]. Thus, maximizing the total of all k terms is equivalent to simultaneously maximizing k terms independently.

If tij = t′ij,∀j ̸= j holds, then the optimal choice of tij, which maximizes λMk E[αij] rjtij, should also maximize λMk E[α′ij]− rjt′ij.

Equivalently, the general UPG game can be decomposed into N subgames, each con-taining one submitted work of assignment and k relevant peer graders. The equilibrium of a general UPG game, described by all learner’s effort on all submitted works they grade, is then composed of their respective effort in all subgames. Here we emphasize that there exists no total time limit for a learner, or equivalently, the maximum time possible to put in, which corresponds to the length of the peer grading phase, is much longer than kU , which is the maximum total time spent on grading. Thus, putting in more time grading one submission does not affect the grading of other submissions. With the help of this property, we can separate them and analyze one subgame at a time. For the rest of the paper, we define such subgame as follows to simplify the notations.

Definition 2.3. In a UPG subgame, only k learners are considered; each of them grades the same submission with value v. tj represents the amount of effort learner j puts in grading the submitted work, dropping the superscript from tij in the general game. Sj, ˆS and αj are defined analogously, inheriting the meaning of the counterparts with superscript. πj = E[λMk αj − rjtj] is now the utility learner aj gets directly from his peer grading effort in this subgame.

Note that N becomes irrelevant once we separate a UPG game into subgames, and we can fully describe such a subgame by specifying (M, k, r, U, λ, fg, fagg(·), faccu(·)). We call this tuple a setting of a UPG subgame. Note that here fgrepresents only k functions.

Chapter 3 Analysis

3.1 Encouraging Conditions and Existence of Equilibria

In this section, we define the following encouraging conditions and prove that these con-ditions lead to the existence of pure Nash equilibria in a UPG subgame.

Definition 3.1. A setting of a UPG subgame (M, k, r, U, λ, fg, fagg(·), faccu(·)) satisfies the Encouraging Conditions if:

• (EC-1)

Given the other players’ strategies T−j = [t1, t2, ..., tj−1, tj+1, ..., tk], E[αj](T−j, tj) is non-decreasing and strictly concave on tj ∈ [0, U].

• (EC-2)

Let T−j and T−j be two strategy profiles satisfying the following properties:

– tp < tp for some p, – tq = tq∀ q ̸= p,

Then ∂t

jE[αj](T−j, tj) < ∂t

jE[αj](T−j , tj),∀tj.

Intuitively, EC-1 says that if a grader puts more effort in grading, his accuracy im-proves, with a diminishing marginal effect. The following lemma says that, combining with a linear cost on time, EC-1 makes the decision problem for the grader straightforward.

Lemma 3.1. If a setting of a UPG subgame satisfies EC-1, then given any T−j, πj(T−j, tj)

The last inequality is from the concavity of E[αj], required in EC-1. Since the partial utility function is concave, its global maximum either lies on the boundary or satisfies the first-order condition

Note that in the boundary cases,

∂t

jE[αj](T−j, tj) > λMkrj,∀tj ∈ [0, U], if tj = U .

∂t

jE[αj](T−j, tj) < λMkrj,∀tj ∈ [0, U], if tj = 0.

On the other hand, EC-2 states that, an arbitrary grader unilaterally increasing his own grading effort will increase the marginal utilities for all other graders. Intuitively, although without specifications, we generally think of the grading possibility functions to be distributed closer to the intrinsic value if the effort is increased. Therefore, unilaterally increasing effort will make the concensus grade also distributed closer to the intrinsic

value. This effect is generally good for graders since they can only invest effort to stand closer to the intrinsic value, instead of the concensus. If this effect can encourage graders to invest more effort, then we can expect a positive reinforcement process. The following theorem shows that pure Nash equilibria always exist if both conditions are satisfied. The positive reinforcement process can be seen in the proof.

Theorem 3.2. In any UPG subgame that satisfies both encouraging conditions, there exists at least one pure Nash equilibrium.

Proof. We prove the existence of pure Nash equilibria by describing a virtual algorithm that “calculates” one.

Given the setting of the UPG subgame, initially we let Ti = 0∀i, or equivalently no grader puts in any kind of effort. We then modify the effort levels one by one in the order of i if an equilibrium is not reached yet. When modifying Ti, we fix all other effort levels T−i, and move Tito the best response of grader i, with respect to T−i.

Trivially, since all effort levels are initialized to be zero, Tieither stays at zero or is raised upwards when it is modified for the first time. Also by EC-2, whenever one of the effort levels is raised upwards, all marginal utilities for all other graders will be increased.

Suppose the effort level of grader j, now temporarily set to Tj, is being modified for the second or more time, and no effort levels were decreased in the previous k− 1 modifica-tions.

By the previous lemma, if the marginal utility, ∂(t

j)E[αj](T−j, tj) λMkrj, is negative in the whole interval [0, U ], then the best response for grader j will remain at zero regard-less of all others’ strategies. Suppose not, then in the previous round, Tj is set to make the marginal utility at zero. By the effects of EC-2, the marginal utility is now non-negative after a whole round of modifications. Suppose in the current round the other graders’ strategies are T−j, and Tj is about to be modified to Tj. From EC-1, since

∂(tj)E[αj](T−j , tj) = λMkrj ∂(tj)E[αj](T−j, tj), we have Tj ≥ Tj. Combined with the fact that all effort levels can only increase in the first round, by induction we can prove that all effort levels are never decreased.

Since the effort levels cannot exceed U , our algorithm will eventually converge to a stable

state when k consecutive effort levels are not modified in their turns, which gives a pure Nash equilibria. However, this algorithm does not converge in any guaranteed time limit, so it has little use of calculating the equilibria in practice.

相關文件