Discussion on the Project of Comparing Various Stochastic Gradient Methods

(1)

Various Stochastic Gradient Methods

(2)

For this part you should get exactly the same result With same initial weights and the same operations you should get the same weights in the first several iterations

For example, here are the weights of running mnist with the following parameters (scripts and results were generated by our TAs)

python3 script.py --optim SGD --bsize 256 --C 0.01 --seed 42 --net CNN_4layers --train_set

/tmp3/data/mnist.mat --val_set /tmp3/data/mnist.t.mat --dim 28 28 1

(3)

For simpleNN, first layer of running 11 batches are

batch 1: 0.14049198 -0.03910705 0.18319398 ...

...

batch 11: 1.36893839e-01 -2.44279262e-02 1.47583246e-01 ...

Results of using Tensorflow

batch 1: 0.14049198 -0.03910705 0.18319398 ...

...

batch 11: 1.36893839e-01 -2.44279262e-02 1.47583246e-01 ...

For those who did not get the same results, probably you did not check the Tensorflow manual in detail

(4)

You must think about how to clearly organize and present your results

For example, a table may be better than the following description:

learning rate ?? gives final accuracy ??, best accuracy ??, learning rate ?? gives final accuracy

??, best accuracy ??, learning rate ?? gives final accuracy ??, best accuracy ??,

You can see that “learning rate,” “final accuracy,”

etc. appear many times

(5)

If the method fails to converge and get bad

accuracy, from our discussion, you may decrease the learning rate

For example, some tried Adam with learning rates 0.01, 0.1, 0.5 on cifar10, and all failed

In this situation you could try for example 0.005 or 0.001

(6)

Please respect the page limit. We would like to see how you can summarize things in two pages