• 沒有找到結果。

Project: Robustness of Newton Methods and Running Time Analysis

N/A
N/A
Protected

Academic year: 2022

Share "Project: Robustness of Newton Methods and Running Time Analysis"

Copied!
17
0
0

加載中.... (立即查看全文)

全文

(1)

and Running Time Analysis

Last updated: May 25, 2020

(2)

1 Robustness of Newton methods

2 Running time of two implementations of Gauss-Newton matrix-matrix products

(3)

Outline

1 Robustness of Newton methods

2 Running time of two implementations of Gauss-Newton matrix-matrix products

(4)

Goal I

From past projects we have known that stochastic gradient is sensitive to the learning rate

We would like to check the situation of Newton methods

(5)

Settings I

All settings are the same as before

However, to avoid lengthy training time, let’s consider a 5000-instance subset at this directory We still use the full test data

(6)

Parameters Considered (Newton Method) I

The Newton implementation is available in simpleNN

You can use either Python or MATLAB implementations

Check README for their use We check the following parameters

Percentage of data for subsampled Hessian:

5%, 10%

With/without

Levenberg-Marquardt method

(7)

Parameters Considered (Stochastic Gradient) I

Everything is the same as before:

simple stochastic gradient + momentum We check the following initial learning rates

10−4, 10−3, 10−2, 10−1

(8)

Checking the Convergence I

Try to see the relation between training time and test accuracy

You now have

4 settings for Newton and

4 settings for stochastic gradient

You need to design maybe figures for the comparison The comparison can be, for example,

training time versus test accuracy

(9)

Checking the Convergence II

Visualization is always a concern. For example, you may not want to draw eight curves in the same figure

(10)

Outline

1 Robustness of Newton methods

2 Running time of two implementations of Gauss-Newton matrix-matrix products

(11)

Gauss-Newton Matrix-vector Products I

We talked about two ways

One is solely by back propagation. The complexity is the lowest, but the memory consumption is huge The other is

forward + backward

We want to analyze their running time by using the MATLAB code.

(12)

Gauss-Newton Matrix-vector Products II

For other parameters let’s use the default ones (e.g., 5% subsampled Gauss Newton)

For this part we should use full data. Otherwise each iteration takes too little time and the results may not be accurate

No need to run many iterations.

By profiling we may check if for the matrix-matrix products, the result is consistent with our

complexity analysis

(13)

Gauss-Newton Matrix-vector Products III

How about the situation of the total time? Any operations with lower complexity but take more time? Why did that happen?

(14)

Presentations and Reports I

Presentations for projects 5 and 6 proj ID

5 ntust_f10802006 6 b05201015

5 b05201024 6 b05201037 5 t08303135 5 b06502060 5 r08521508 6 d08525008 6 b05701231

(15)

Presentations and Reports II

6 b06901143 5 t08902130 5 b06902124 6 b05902035 5 b05902050 5 b05902105 5 d08921024 6 a08922103 5 a08922119

(16)

Presentations and Reports III

5 d08922034 5 p08922005 6 r08922019 6 r08922082 5 r08922163 5 r07922100 6 r07922154 6 r08922a07 5 d04941016 6 r08942062 6 a08946101

(17)

Presentations and Reports IV

please do a 10-minute presentation (9-minute the contents and 1-minute Q&A)

參考文獻

相關文件

“Find sufficiently accurate starting approximate solution by using Steepest Descent method” + ”Compute convergent solution by using Newton-based methods”. The method of

In this study, we consider the application of a simulated annealing (SA) heuristic to the truck and trailer routing problem with time windows (TTRPTW), an extension of the truck

Abstract In this paper, we consider the smoothing Newton method for solving a type of absolute value equations associated with second order cone (SOCAVE for short), which.. 1

Lin, A smoothing Newton method based on the generalized Fischer-Burmeister function for MCPs, Nonlinear Analysis: Theory, Methods and Applications, 72(2010), 3739-3758..

Use Newton’s method to approximate the solutions with initial values

SG is simple and effective, but sometimes not robust (e.g., selecting the learning rate may be difficult) Is it possible to consider other methods.. In this work, we investigate

Large data: if solving linear systems is needed, use iterative (e.g., CG) instead of direct methods Feature correlation: methods working on some variables at a time (e.g.,

x=spawn P-FIB(n-1)