• 沒有找到結果。

Logistic Regression

N/A
N/A
Protected

Academic year: 2022

Share "Logistic Regression"

Copied!
20
0
0

加載中.... (立即查看全文)

全文

(1)

Logistic Regression

For a label-feature pair (y,x), assume the probability model

p(y |x) = 1

1 + e−ywTx. w is the parameter to be decided Assume

(yi,xi), i = 1, . . . , l are training instances

(2)

Logistic Regression (Cont’d)

Logistic regression finds w by maximizing the following likelihood

maxw l

Y

i =1

p (yi|xi) . (1) Regularized logistic regression

minw

1

2wTw + C

l

X

i =1

log



1 + e−yiwTxi



. (2)

(3)

Gradient-descent Methods

Given initial w0 and constants η ∈ (0, 1).

For k = 0, 1, . . .

Calculate the direction sk = −∇f (wk) Find αk satisfying

f (wk + αksk) ≤ f (wk) + ηαk∇f (wk)Tsk Update wk+1 = wk + αksk.

(4)

Gradient

We note that gradient takes the following form

∇f (w) = w + C

l

X

i =1

e−yiwTxi 1 + e−yiwTxi

!

(−yixi)

= w + C

l

X

i =1

 1

1 + e−yiwTxi − 1

 yixi

(5)

Backtracking Line Search

To find αk satisfying

f (wk + αksk) ≤ f (wk) + ηαk∇f (wk)Tsk Sequentially check αk = 1, 1/2, 1/4, 1/8

Recall the function is 1

2wTw + C

l

X

i =1

log



1 + e−yiwTxi

 . You save time by the property

(w + αd)Tx = wTx + αdTx

(6)

Backtracking Line Search (Cont’d)

You can keep

(wk+1)Tx = (wk + αksk)Tx for the next iteration

But error propagation is a concern

(7)

Stopping Condition

You can use

k∇f (wk)k ≤ k∇f (w0)k This is a relative condition

You may choose

 = 0.01

Note that a smaller  will cause more iterations You may need to set a maximal number of iterations as well

(8)

Newton Methods

Newton direction

mins ∇f (wk)Ts + 1

2sT2f (wk)s wk: current iterate

This is the same as solving Newton linear system

2f (wk)s = −∇f (wk)

(9)

Newton Methods (Cont’d)

Given initial w0 and constants η ∈ (0, 1).

For k = 0, 1, . . .

Solve Newton linear system to obtain direction sk

Find αk satisfying

f (wk + αksk) ≤ f (wk) + ηαk∇f (wk)Tsk Update wk+1 = wk + αksk.

(10)

Newton Linear System

Hessian ∇2f (wk) too large to be stored

2f (wk) : n × n, n : number of features But Hessian has a special form

2f (w) = I + C

l

X

i =1

yixi

e−yiwTxi (1 + e−yiwTxi)2

! yixTi

= I + CXTDX

(11)

Newton Linear System (Cont’d)

I: identity.

X =

 xT1

...

xTl

 is the data matrix.

D diagonal with

Dii = e−yiwTxi (1 + e−yiwTxi)2

(12)

Newton Linear System (Cont’d)

Using Conjugate Gradient method to solve the linear system.

2f (wk)s = −∇f (wk)

Only a sequence of Hessian-vector products are needed

2f (w)s = s + C · XT(D(Xs)) Therefore, we have a Hessian-free approach

(13)

Conjugate Gradient

Given ξk < 1. Let ¯s0 = 0,r0 = −∇f (wk), and d0 = r0. For i = 0, 1, . . . (inner iterations)

If

krik ≤ ξkk∇f (wk)k, then output sk = ¯si and stop.

αi = krik2/((di)T2f (wk)di).

¯si +1 = ¯si + αidi.

ri +1 = ri − αi2f (wk)di. βi = kri +1k2/krik2.

di +1 = ri +1 + βdi.

(14)

Conjugate Gradient (Cont’d)

The CG stopping condition

krik ≤ ξkk∇f (wk)k, is important

It’s a relative stopping condition. It becomes strict in the end because of small k∇f (wk)k

Therefore, we only approximately obtain the Newton direction

(15)

Conjugate Gradient (Cont’d)

In addition to line search, trust region is another method to ensure sufficient decrease; see the implementation in LIBLINEAR (Lin et al., 2007) http:

//www.csie.ntu.edu.tw/~cjlin/liblinear Note that αi in CG is different from αk in line search procedure

Check Golub and Van Loan (1996) for details of conjugate gradient methods

(16)

Homework

Implement

Gradient-descent method with line search Newton method with line search and CG on MATLAB, Octave, Python, or R

MATLAB and Octave may be more suitable because of their good support on matrix operations

Train the data set “kdd2010 (bridge to algebra)” at LIBSVM Data Set http://www.csie.ntu.edu.

tw/~cjlin/libsvmtools/datasets

(17)

Homework (Cont’d)

To read data to MATLAB/Octave, check

libsvmread.c in the matlab directory of LIBLINEAR To be more precise you must build the mex file by

>> mex libsvmread.c

See liblinear/matlab/README for more details

(18)

Homework (Cont’d)

Let’s use

η = 0.01, ξk = 0.1,w0 = 0 For regularization parameter, set C = 0.1

It is known that a larger C causes more iterations You can check the correctness by comparing with the objective function value of LIBLINEAR (option -s 0 for logistic regression)

You may start with a smaller data set

(19)

Homework (Cont’d)

For Newton method, you should observe that in final iterations, step size α becomes 1.

If you don’t see that, you can try to use a smaller C = 0.01 and reduce  to 0.001 or even smaller.

You want to compare gradient-descent and Newton methods

(20)

Homework (Cont’d)

You may also compare yours with LIBLINEAR.

They differ in

matlab versus C

line search versus trust region to adjust the Newton direction

We require you to submit

A report of ≤ 4 pages (without including code) Your code

參考文獻

相關文件

• In 2007, Mitsubishi UFJ Trust &amp; Banking, a division of Japan's largest banking group, started to allow employees to go home up to three hours early to care for children

•Last month I watched a dance class in 崇文 Elementary School and learned the new..

Now, nearly all of the current flows through wire S since it has a much lower resistance than the light bulb. The light bulb does not glow because the current flowing through it

The objects on orange orbits (Mercury, Venus, Mars, Jupiter, and Saturn) rotate around the sun.. Johannes Kepler, Weil, Württemberg

(It is also acceptable to have either just an image region or just a text region.) The layout and ordering of the slides is specified in a language called SMIL.. SMIL is covered in

In addition to speed improvement, another advantage of using a function handle is that it provides access to subfunctions, which are normally not visible outside of their

As n increases, not only does the fixed locality bound of five become increasingly negligible relative to the size of the search space, but the probability that a random

If the error is in the acceptance range, it means we don’t have to do extra support to achieve what the commander wishes for the battle result; In another hand, if the error ( E