Decomposition of Model - 促進大型開放式線上課程中的同儕互評

First and foremost, a simplification can be directly made from our model. We can observe that an UPG game is indeed a series of independent smaller subgames, each containing only one submitted work of assignment. This is described in the theorem below.

Definition 2.2. Given the other players’ strategy profile, T_−j ={T1, ...T_j₋₁, T_j+1, ..., T_N}, we define learner a_j’s best response T_j^∗to be the optimal choice of T_jthat maximizes E[π_j], conditioned on T_−j.

Theorem 2.1. Given the other players’ two strategy profiles, T_−j, T_−j^′ , the corresponding best responses t^∗_j and t^′∗_j satisfies the following: t^∗i_j = t^′∗i_j, if tⁱ_j′ = t^′i_j′,∀j^′ ̸= j.

Proof. Consider player j. We first observe that his total expected utility can be decom-posed into

E[πj] =∑

(λM

k E[αⁱ_j]− rjtⁱ_j).

Therefore, maximizing the expected utility is equivalent to maximizing the total of the k nonzero terms, each corresponding to the expected utility from grading one specific submitted work. Given the other players’ strategies, fixing i, the term ^λM_k E[αⁱ_j]− rjtⁱ_j

only depends on Tⁱ = [tⁱ_j|G(i, j) = 1]. Thus, maximizing the total of all k terms is equivalent to simultaneously maximizing k terms independently.

If tⁱ_j_′ = t^′i_j_′,∀j^′ ̸= j holds, then the optimal choice of tⁱj, which maximizes ^λM_k E[αⁱ_j]− r_jtⁱ_j, should also maximize ^λM_k E[α^′i_j]− rjt^′i_j.

Equivalently, the general UPG game can be decomposed into N subgames, each con-taining one submitted work of assignment and k relevant peer graders. The equilibrium of a general UPG game, described by all learner’s effort on all submitted works they grade, is then composed of their respective effort in all subgames. Here we emphasize that there exists no total time limit for a learner, or equivalently, the maximum time possible to put in, which corresponds to the length of the peer grading phase, is much longer than kU , which is the maximum total time spent on grading. Thus, putting in more time grading one submission does not affect the grading of other submissions. With the help of this property, we can separate them and analyze one subgame at a time. For the rest of the paper, we define such subgame as follows to simplify the notations.

Definition 2.3. In a UPG subgame, only k learners are considered; each of them grades the same submission with value v. t_j represents the amount of effort learner j puts in grading the submitted work, dropping the superscript from tⁱ_j in the general game. S_j, ˆS and α_j are defined analogously, inheriting the meaning of the counterparts with superscript. πj = E[^λM_k α_j − rjt_j] is now the utility learner a_j gets directly from his peer grading effort in this subgame.

Note that N becomes irrelevant once we separate a UPG game into subgames, and we can fully describe such a subgame by specifying (M, k, r, U, λ, fg, fagg(·), faccu(·)). We call this tuple a setting of a UPG subgame. Note that here fgrepresents only k functions.

Chapter 3 Analysis

3.1 Encouraging Conditions and Existence of Equilibria

In this section, we define the following encouraging conditions and prove that these con-ditions lead to the existence of pure Nash equilibria in a UPG subgame.

Definition 3.1. A setting of a UPG subgame (M, k, r, U, λ, fg, fagg(·), faccu(·)) satisfies the Encouraging Conditions if:

• (EC-1)

Given the other players’ strategies T_−j = [t₁, t₂, ..., t_j₋₁, t_j+1, ..., t_k], E[α_j](T_−j, t_j) is non-decreasing and strictly concave on t_j ∈ [0, U].

• (EC-2)

Let T_−j and T^′_−j be two strategy profiles satisfying the following properties:

– tp < t^′_p for some p, – tq = t^′_q∀ q ̸= p,

Then _∂t^∂

jE[α_j](T_−j, t_j) < _∂t^∂

jE[α_j](T_−j^′ , t_j),∀tj.

Intuitively, EC-1 says that if a grader puts more effort in grading, his accuracy im-proves, with a diminishing marginal effect. The following lemma says that, combining with a linear cost on time, EC-1 makes the decision problem for the grader straightforward.

Lemma 3.1. If a setting of a UPG subgame satisfies EC-1, then given any T_−j, πj(T_−j, tj)

The last inequality is from the concavity of E[α_j], required in EC-1. Since the partial utility function is concave, its global maximum either lies on the boundary or satisfies the first-order condition

Note that in the boundary cases,

• _∂t^∂

jE[αj](T_−j, t_j) > _λM^kr^j,∀tj ∈ [0, U], if t^∗j = U .

• _∂t^∂

jE[α_j](T_−j, t_j) < _λM^kr^j,∀tj ∈ [0, U], if t^∗j = 0.

On the other hand, EC-2 states that, an arbitrary grader unilaterally increasing his own grading effort will increase the marginal utilities for all other graders. Intuitively, although without specifications, we generally think of the grading possibility functions to be distributed closer to the intrinsic value if the effort is increased. Therefore, unilaterally increasing effort will make the concensus grade also distributed closer to the intrinsic

value. This effect is generally good for graders since they can only invest effort to stand closer to the intrinsic value, instead of the concensus. If this effect can encourage graders to invest more effort, then we can expect a positive reinforcement process. The following theorem shows that pure Nash equilibria always exist if both conditions are satisfied. The positive reinforcement process can be seen in the proof.

Theorem 3.2. In any UPG subgame that satisfies both encouraging conditions, there exists at least one pure Nash equilibrium.

Proof. We prove the existence of pure Nash equilibria by describing a virtual algorithm that “calculates” one.

Given the setting of the UPG subgame, initially we let T_i = 0∀i, or equivalently no grader puts in any kind of effort. We then modify the effort levels one by one in the order of i if an equilibrium is not reached yet. When modifying T_i, we fix all other effort levels T_−i, and move T_ito the best response of grader i, with respect to T_−i.

Trivially, since all effort levels are initialized to be zero, T_ieither stays at zero or is raised upwards when it is modified for the first time. Also by EC-2, whenever one of the effort levels is raised upwards, all marginal utilities for all other graders will be increased.

Suppose the effort level of grader j, now temporarily set to Tj, is being modified for the second or more time, and no effort levels were decreased in the previous k− 1 modifica-tions.

By the previous lemma, if the marginal utility, _∂(t^∂

j)E[α_j](T_−j, t_j)− _λM^kr^j, is negative in the whole interval [0, U ], then the best response for grader j will remain at zero regard-less of all others’ strategies. Suppose not, then in the previous round, T_j is set to make the marginal utility at zero. By the effects of EC-2, the marginal utility is now non-negative after a whole round of modifications. Suppose in the current round the other graders’ strategies are T_−j, and T_j is about to be modified to T_j^′. From EC-1, since

∂

∂(tj)E[αj](T_−j^′ , t_j) = _λM^kr^j ≤ _∂(t^∂_j₎E[αj](T_−j, t_j), we have T_j^′ ≥ Tj. Combined with the fact that all effort levels can only increase in the first round, by induction we can prove that all effort levels are never decreased.

Since the effort levels cannot exceed U , our algorithm will eventually converge to a stable

state when k consecutive effort levels are not modified in their turns, which gives a pure Nash equilibria. However, this algorithm does not converge in any guaranteed time limit, so it has little use of calculating the equilibria in practice.

在文檔中促進大型開放式線上課程中的同儕互評 (頁 22-28)