• 沒有找到結果。

Convergence of Perceptron Learning Algorithm

N/A
N/A
Protected

Academic year: 2022

Share "Convergence of Perceptron Learning Algorithm"

Copied!
2
0
0

加載中.... (立即查看全文)

全文

(1)

Machine Learning Foundations (NTU, Fall 2020) instructor: Hsuan-Tien Lin

Convergence of Perceptron Learning Algorithm

Hsuan-Tien Lin, September 26, 2020 Page 14 of lecture 2 proves that

wfTwt+1≥ wTfwt+ min

n ynwTfxn

| {z }

kwfk·ρ

,

where we defined the last term to be a constant kwfk · ρ on page 16 of lecture 2. Assume that w0= 0, as set on page 15 of lecture 2. Then, we have

wfTw0 =

wfTw1 ≥ wTfw0+ kwfk · ρ wfTw2 ≥ + kwfk · ρ

≥ + kwfk · ρ

· · · ·

≥ wTfwT −1+ kwfk · ρ

Summing all inequalities above, we get

wfTwT ≥ · kwfk · ρ (1)

This says the inner product grows by at least · kwfk · ρ after T updates. Note that for linear separable data with both classes of examples, a separating wf means kwfk > 0 and ρ > 0.

Now let’s look at the results on page 15 of lecture 2. It proves that kwt+1k2≤ kwtk2+ max

n kxnk2

| {z }

R2

,

where we defined the last term to be a constant R2on page 16 of lecture 2. Assume that w0= 0, as set on page 15 of lecture 2. Then, we have

kw0k2 =

kw1k2 ≤ kw0k2+ R2 kw2k2 ≤ + R2

≤ + R2

· · · ·

≤ kwT −1k2+ R2

Summing all inequalities above, we get

kwTk2 ≤ · R2 (2)

This says the squared length grows by at most R2after T updates.

If kwTk2= 0, this means wTfwT = 0 as well, which contradicts (1) for T ≥ 1 because kwfk > , kρk > , and wfTw0 = . So kwTk2 must be strictly positive for T ≥ 1. Then, we can “divide” (1) by the “square root” of (2) to get

wTfwT

kwTk ≥ ·ρkwfk

R . (3)

1 of 2

(2)

Machine Learning Foundations (NTU, Fall 2020) instructor: Hsuan-Tien Lin

That is, for linear separable data with both classes of examples, and for T ≥ 1, let θT denote the angle between wf and wT, we have

≥ cos θT = wTfwT

· kwTk ≥ · ρ R This proves that T , the number of updates of PLA, cannot be more than

R2 ρ2,

which is the conclusion on page 16 of lecture 2. That is, PLA will converge with no more than Rρ22

updates.

2 of 2

參考文獻

相關文件

(10 points) By the Extreme Value Theorem, a continuous function on a sphere attains both absolute maximum and

We show that a standard Monte Carlo algorithm - The Gibbs sampler - can be seen as alternating projections into closed subspaces of a Hilbert space.. This allows classical

• To enhance teachers’ knowledge and understanding about the learning and teaching of grammar in context through the use of various e-learning resources in the primary

We develop a stable continuation method for the computation of positive bound states of time-independent, m-coupled nonlinear algebra equation (DNLS) which describe a

We propose a coupling interface method (CIM) under Cartesian grid for solving elliptic complex interface problems in arbitrary dimensions, where the coefficients, the source

Then, we tested the influence of θ for the rate of convergence of Algorithm 4.1, by using this algorithm with α = 15 and four different θ to solve a test ex- ample generated as

Although we have obtained the global and superlinear convergence properties of Algorithm 3.1 under mild conditions, this does not mean that Algorithm 3.1 is practi- cally efficient,

For the proposed algorithm, we establish a global convergence estimate in terms of the objective value, and moreover present a dual application to the standard SCLP, which leads to