• 沒有找到結果。

Discussion on the Project of Comparing Various Stochastic Gradient Methods

N/A
N/A
Protected

Academic year: 2022

Share "Discussion on the Project of Comparing Various Stochastic Gradient Methods"

Copied!
6
0
0

加載中.... (立即查看全文)

全文

(1)

Various Stochastic Gradient Methods

(2)

For this part you should get exactly the same result With same initial weights and the same operations you should get the same weights in the first several iterations

For example, here are the weights of running mnist with the following parameters (scripts and results were generated by our TAs)

python3 script.py --optim SGD --bsize 256 --C 0.01 --seed 42 --net CNN_4layers --train_set

/tmp3/data/mnist.mat --val_set /tmp3/data/mnist.t.mat --dim 28 28 1

(3)

For simpleNN, first layer of running 11 batches are

batch 1: 0.14049198 -0.03910705 0.18319398 ...

...

batch 11: 1.36893839e-01 -2.44279262e-02 1.47583246e-01 ...

Results of using Tensorflow

batch 1: 0.14049198 -0.03910705 0.18319398 ...

...

batch 11: 1.36893839e-01 -2.44279262e-02 1.47583246e-01 ...

For those who did not get the same results, probably you did not check the Tensorflow manual in detail

(4)

You must think about how to clearly organize and present your results

For example, a table may be better than the following description:

learning rate ?? gives final accuracy ??, best accuracy ??, learning rate ?? gives final accuracy

??, best accuracy ??, learning rate ?? gives final accuracy ??, best accuracy ??,

You can see that “learning rate,” “final accuracy,”

etc. appear many times

(5)

If the method fails to converge and get bad

accuracy, from our discussion, you may decrease the learning rate

For example, some tried Adam with learning rates 0.01, 0.1, 0.5 on cifar10, and all failed

In this situation you could try for example 0.005 or 0.001

(6)

Please respect the page limit. We would like to see how you can summarize things in two pages

參考文獻

相關文件

You should identify the major operations Then check percentage of these operations From the results you draw some conclusions Let’s consider a good example (from P2 of Davis

Then, implement the SGD algorithm for logistic regression by replacing the SGD update step in the previous problem with the one on page 10 of Lecture 11. Pick one example uniformly

If both a compute-bound process and an I/O-bound process are waiting for a time slice, which should be given

(*) Implement the fixed learning rate gradient descent algorithm below for logistic regression, ini- tialized with 0?. Run the algorithm with η = 0.001 and T = 2000 on the following

Full credit if they got (a) wrong but found correct q and integrated correctly using their answer.. Algebra mistakes -1% each, integral mistakes

The areas of these three parts are represented by the following integrals:. (1pt for

C) protein chains maintained by interactions of peptide backbones D) amino acid sequence maintained by peptide bonds. E) protein structure maintained through multiple hydrogen

The difference in heights of the liquid in the two sides of the manometer is 43.4 cm when the atmospheric pressure is 755 mm Hg.. 11) Based on molecular mass and dipole moment of