Convergence of Perceptron Learning Algorithm

(1)

Machine Learning Foundations (NTU, Fall 2020) instructor: Hsuan-Tien Lin

Convergence of Perceptron Learning Algorithm

Hsuan-Tien Lin, September 26, 2020 Page 14 of lecture 2 proves that

w_f^Twt+1≥ w^T_fwt+ min

n ynw^T_fxn

| {z }

kwfk·ρ

,

where we defined the last term to be a constant kwfk · ρ on page 16 of lecture 2. Assume that w0= 0, as set on page 15 of lecture 2. Then, we have

w_f^Tw₀ =

w_f^Tw₁ ≥ w^T_fw₀+ kw_fk · ρ w_f^Tw2 ≥ + kwfk · ρ

≥ + kwfk · ρ

· · · ·

≥ w^T_fwT −1+ kwfk · ρ

Summing all inequalities above, we get

w_f^TwT ≥ · kwfk · ρ (1)

This says the inner product grows by at least · kwfk · ρ after T updates. Note that for linear separable data with both classes of examples, a separating w_f means kw_fk > 0 and ρ > 0.

Now let’s look at the results on page 15 of lecture 2. It proves that kwt+1k²≤ kwtk²+ max

n kxnk²

| {z }

R²

,

where we defined the last term to be a constant R²on page 16 of lecture 2. Assume that w0= 0, as set on page 15 of lecture 2. Then, we have

kw0k² =

kw1k² ≤ kw0k²+ R² kw2k² ≤ + R²

≤ + R²

· · · ·

≤ kw_{T −1}k²+ R²

Summing all inequalities above, we get

kw_Tk² ≤ · R² (2)

This says the squared length grows by at most R²after T updates.

If kw_Tk²= 0, this means w^T_fw_T = 0 as well, which contradicts (1) for T ≥ 1 because kw_fk > , kρk > , and w_f^Tw0 = . So kwTk² must be strictly positive for T ≥ 1. Then, we can “divide” (1) by the “square root” of (2) to get

w^T_fw_T

kwTk ≥ ·ρkw_fk

R . (3)

1 of 2

(2)

Machine Learning Foundations (NTU, Fall 2020) instructor: Hsuan-Tien Lin

That is, for linear separable data with both classes of examples, and for T ≥ 1, let θ_T denote the angle between wf and wT, we have

≥ cos θT = w^T_fw_T

· kwTk ≥ · ρ R This proves that T , the number of updates of PLA, cannot be more than

R² ρ²,

which is the conclusion on page 16 of lecture 2. That is, PLA will converge with no more than ^R_ρ2²

updates.

2 of 2