• 沒有找到結果。

Project: More Experiments on Stochastic Gradient Methods

N/A
N/A
Protected

Academic year: 2022

Share "Project: More Experiments on Stochastic Gradient Methods"

Copied!
17
0
0

加載中.... (立即查看全文)

全文

(1)

Project: More Experiments on Stochastic Gradient Methods

Last updated: March 20, 2021

(2)

Goal

We want to know more the internal details of simpleNN

We want to roughly compare the two stochastic gradient approaches: SG with momentum and Adam

(3)

Project Contents: First Part I

In our code, stochastic gradient is implemented in a subroutine gradient trainer in train.py. You can see a for loop there.

for epoch in range(0, args.epoch):

...

for i in range(num_iters):

...

step, _, batch_loss= sess.run(

[global_step, optimizer, loss_with_reg],

feed_dict = {x: batch_input, y: batch_labels, learning_rate: lr})

(4)

Project Contents: First Part II

The optimizer was specified earlier:

optimizer = tf.compat.v1.train.MomentumOptimizer(

learning_rate=learning_rate,

momentum=config.momentum).minimize(

loss_with_reg,

global_step=global_step)

It happened that we run the SG steps by ourself, but in Tensorflow there must be a way so that

stochastic gradient methods can be directly called in one statement

(5)

Project Contents: First Part III

That is, for a typical user of tensorflow, they would call

train.MomentumOptimizer once without the for loop

We would like to check if under the same initial model, the two settings give the same results To check “the same results” you can, for example, compare their models at each iteration or compare their objective values

Therefore, for this part of the project you only need to run very few iterations (e.g., 5)

(6)

Project Contents: First Part IV

Further, we should use the simplest setting: SG without momentum

You can print out weight values for the comparison If you face difficulties, consider to simplify your settings for debugging:

Use a small set of data (e.g.,

data/mnist-demo.mat) or evan a subset of just 100 instances

Enlarge --bsize to be the same as the number of data. Then essentially you do gradient descent

(7)

Project Contents: First Part V

We will separately discuss

modification of simpleNN, and direct use of Tensorflow

in subsequent slides

The regularization term may be a concern. Need to make sure that the two settings minimize the same objective function

For this project, you definitely need to trace the subroutine gradient trainer in train.py.

(8)

Modification of simpleNN I

One issue is that in the beginning of each update, we randomly select instances as the current batch:

idx = np.random.choice(

np.arange(0, num_data),

size=config.bsize, replace=False) Tensorflow doesn’t do that so you can replace the code with

idx = np.arange(i*config.bsize,

min((i+1)*config.bsize, num_data)) The min operation handles the situation if number of data is not a multiple of the batch size

(9)

Direct Use of Tensorflow MomentumOptimizer I

The workflow should be like this Specify the network

model = ...

Specify the optimizer

model.compile(optimizer = ...

Do the training model.fit = ...

To specify the network, CNN cannot be directly used

(10)

Direct Use of Tensorflow MomentumOptimizer II

Instead you can directly do it in the subroutine gradient trainer

Here we provide the code

model = CNN_model(config.net, config.dim, config.num_cls) You need to change the line

param = tf.compat.v1.trainable_variables() to

param = model.trainable_weights

(11)

Direct Use of Tensorflow MomentumOptimizer III

CNN and CNN model both use global variables, so we specify which to use to avoid variable conflicts.

Note that there are two such places in

gradient trainer() and you need to change both For calculating the objective value, you need to replace

loss_with_reg = reg_const*reg + loss/batch_size

with

(12)

Direct Use of Tensorflow MomentumOptimizer IV

loss_with_reg = lambda y_true, y_pred:

reg_const*reg + tf.reduce_mean(tf.reduce_sum(

tf.square(y_true - y_pred), axis=1))

We no longer have the outputs of the model, so the loss can’t be calculated directly

Instead we use some Tensorflow functions to calculate the objective value

For the use of MomentumOptimizer you should check Tensorflow manual in detail

(13)

Direct Use of Tensorflow MomentumOptimizer V

This is what we want you to learn

There are no restrictions on the data set to be used in this part. Even mnist-demo is fine. You can use any data you want.

We’ve modified the net.py to make it easier for everyone to do this project. We will also be

constantly improving simpleNN. Please constantly git pull the latest version.

(14)

Project Contents: Second Part I

We want to check the test accuracy of two

stochastic gradient methods: SG with momentum and Adam

Note that in the first project, what we used is the simplest SG without momentum

We also hope to roughly check the parameter sensitivity

Under each parameter setting, we run a large number (e.g., 500) of iterations and use the model at the last iteration

(15)

Project Contents: Second Part II

We do not use a model before the last iteration because a validation process was not conducted Please work on the same MNIST and CIFAR10 data sets used in the previous project

In your report, give your results, observations and thoughts

In the previous project, we used only default parameters

You can slightly vary parameters (e.g., learning rate in SGD and Adam) and check the test accuracy

(16)

Project Contents: Second Part III

Due to the lengthy running time, no need to try many parameter settings

Remember we don’t judge you solely by your accuracy

(17)

Presentation

Students selected for presentation please do a

10-minute talk (9-minute the contents and 1-minute Q&A)

參考文獻

相關文件

For example, here are the weights of running mnist with the following parameters (scripts and results were generated by our TAs)?. python3 script.py --optim SGD --bsize 256 --C

In this homework, you are asked to implement k-d tree for the k = 1 case, and the data structure should support the operations of querying the nearest point, point insertion, and

Use the definitions of open sets and closed sets to prove the same

(12%) Determine whether the statement is true of false... through out D, we conclude that F is conservative

To get the inteval of convergence , we need to check the end points, 1

However, if a person is allowed to choose the same number twice, then the first two numbers can be chosen in 44 × 44 = 1936 ways. The above example makes a distinction between

At the same time we improve the solution by using the information of the flow field at every time step, e.g., the surface pressure gradient, to adjust the initial data of U at =

If a contributor is actively seeking an appointment in the aided school sector but has not yet obtained an appointment as a regular teacher in a grant/subsidized school, or he