Project: Robustness of Newton Methods and Running Time Analysis

(1)

and Running Time Analysis

Last updated: May 25, 2020

(2)

1 Robustness of Newton methods

2 Running time of two implementations of Gauss-Newton matrix-matrix products

(3)

Outline

(4)

Goal I

From past projects we have known that stochastic gradient is sensitive to the learning rate

We would like to check the situation of Newton methods

(5)

Settings I

All settings are the same as before

However, to avoid lengthy training time, let’s consider a 5000-instance subset at this directory We still use the full test data

(6)

Parameters Considered (Newton Method) I

The Newton implementation is available in simpleNN

You can use either Python or MATLAB implementations

Check README for their use We check the following parameters

Percentage of data for subsampled Hessian:

5%, 10%

With/without

Levenberg-Marquardt method

(7)

Parameters Considered (Stochastic Gradient) I

Everything is the same as before:

simple stochastic gradient + momentum We check the following initial learning rates

10⁻⁴, 10⁻³, 10⁻², 10⁻¹

(8)

Checking the Convergence I

Try to see the relation between training time and test accuracy

You now have

4 settings for Newton and

4 settings for stochastic gradient

You need to design maybe figures for the comparison The comparison can be, for example,

training time versus test accuracy

(9)

Checking the Convergence II

Visualization is always a concern. For example, you may not want to draw eight curves in the same figure

(10)

Outline

(11)

Gauss-Newton Matrix-vector Products I

We talked about two ways

One is solely by back propagation. The complexity is the lowest, but the memory consumption is huge The other is

forward + backward

We want to analyze their running time by using the MATLAB code.

(12)

Gauss-Newton Matrix-vector Products II

For other parameters let’s use the default ones (e.g., 5% subsampled Gauss Newton)

For this part we should use full data. Otherwise each iteration takes too little time and the results may not be accurate

No need to run many iterations.

By profiling we may check if for the matrix-matrix products, the result is consistent with our

complexity analysis

(13)

Gauss-Newton Matrix-vector Products III

How about the situation of the total time? Any operations with lower complexity but take more time? Why did that happen?

(14)

Presentations and Reports I

Presentations for projects 5 and 6 proj ID

5 ntust_f10802006 6 b05201015

5 b05201024 6 b05201037 5 t08303135 5 b06502060 5 r08521508 6 d08525008 6 b05701231

(15)

Presentations and Reports II

6 b06901143 5 t08902130 5 b06902124 6 b05902035 5 b05902050 5 b05902105 5 d08921024 6 a08922103 5 a08922119

(16)

Presentations and Reports III

5 d08922034 5 p08922005 6 r08922019 6 r08922082 5 r08922163 5 r07922100 6 r07922154 6 r08922a07 5 d04941016 6 r08942062 6 a08946101

(17)

Presentations and Reports IV

please do a 10-minute presentation (9-minute the contents and 1-minute Q&A)