• 沒有找到結果。

We will check techniques to address the difficulty of storing or inverting the Hessian But before that let’s derive the mathematical form

N/A
N/A
Protected

Academic year: 2022

Share "We will check techniques to address the difficulty of storing or inverting the Hessian But before that let’s derive the mathematical form"

Copied!
10
0
0

加載中.... (立即查看全文)

全文

(1)

We will check techniques to address the difficulty of storing or inverting the Hessian

But before that let’s derive the mathematical form

(2)

Hessian Matrix I

For CNN, the gradient of f (θ) is

∇f (θ) = 1

Cθ + 1 l

l

X

i =1

(Ji)TzL+1,iξ(zL+1,i;yi, Z1,i), (1) where

Ji =

∂z1L+1,i

∂θ1 · · · ∂z∂θ1L+1,i ... ... ...n

∂znL+1L+1,i

∂θ1 · · · ∂z

L+1,i nL+1

∂θn

nL+1×n

, i = 1, . . . , l , (2)

(3)

Hessian Matrix II

is the Jacobian of zL+1,i(θ).

The Hessian matrix of f (θ) is

2f (θ) = 1

CI + 1 l

l

X

i =1

(Ji)TBiJi

+ 1 l

l

X

i =1 nL

X

j =1

∂ξ(zL+1,i;yi, Z1,i)

∂zjL+1,i

2zjL+1,i

∂θ1∂θ1 · · ·

2zjL+1,i

∂θ1∂θn

... . . . ...

2zjL+1,i

∂θn∂θ1 · · ·

2zjL+1,i

∂θn∂θn

 ,

(4)

Hessian Matrix III

where I is the identity matrix and Bi is the Hessian of ξ(·) with respect to zL+1,i:

Bi = ∇2zL+1,i,zL+1,iξ(zL+1,i;yi, Z1,i) More precisely,

Btsi = ∂2ξ(zL+1,i;yi, Z1,i)

∂ztL+1,i∂zsL+1,i

, ∀t, s = 1, . . . , nL+1. (3)

Usually Bi is very simple.

(5)

Hessian Matrix IV

For example, if the squared loss is used, ξ(zL+1,i;yi) = ||zL+1,i −yi||2. then

Bi =

 2

. . . 2

Usually we consider a convex loss function ξ(zL+1,i;yi)

with respect to zL+1,i

(6)

Hessian Matrix V

Thus Bi is positive semi-definite

The last term of ∇2f (θ) may not be positive semi-definite

Note that for a twice differentiable function f (θ) f (θ) is convex

if and only if

2f (θ) is positive semi-definite

(7)

Jacobian Matrix

The Jacobian matrix of zL+1,i(θ) ∈ RnL+1 is

Ji =

∂z1L+1,i

∂θ1 · · · ∂z∂θ1L+1,i ... ... ...n

∂znLL+1,i

∂θ1 · · · ∂z

L+1,i nL

∂θn

∈ RnL+1×n, i = 1, . . . l .

nL+1: # of neurons in the output layer n: number of total variables

nL+1 × n can be large

(8)

Gauss-Newton Matrix I

The Hessian matrix ∇2f (θ) is now not positive definite.

We may need a positive definite approximation Many existing Newton methods for NN has

considered the Gauss-Newton matrix (Schraudolph, 2002)

G = 1

CI + 1 l

l

X

i =1

(Ji)TBiJi by removing the last term in ∇2f (θ)

(9)

Gauss-Newton Matrix II

The Gauss-Newton matrix is positive definite if Bi is positive semi-definite

This can be achieved if we use a convex loss function in terms of zL+1,i(θ)

We then solve

Gd = −∇f (θ)

(10)

References I

N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent.

Neural Computation, 14(7):1723–1738, 2002.

參考文獻

相關文件

In this note, we are going to learn techniques of evaluating definite integrals or improper integrals via the method of differentiation.. Let us take a look at

surfaces 895 Andras Nemethi Hypersurface singularities with 2-dimensional critical locus 922 Gavin Brown, Stamatis Koumandos and Kunyang Wang 0'' • ^ '° ^" c : *. some

By the similar reasoning of pumping lemma, if some non-S variable appears at least twice in a path from root, then the looping part between the two occurrences can be repeated as

Thus f is Riemann integrable over

Proper Conformal Collineation in Conformally Recurrent Spaces 333 V.B.L. CHAURASIA AND

The first row shows the eyespot with white inner ring, black middle ring, and yellow outer ring in Bicyclus anynana.. The second row provides the eyespot with black inner ring

(12%) Among all planes that are tangent to the surface x 2 yz = 1, are there the ones that are nearest or farthest from the origin?. Find such tangent planes if

[r]