Discussion on the Project of Analyzing Running Time of Stochastic Gradient Methods

(1)

Discussion on the Project of Analyzing Running Time of Stochastic Gradient

Methods

(2)

Some General Comments I

You should analyze your results

Some just present tables or figures, but we want more insight

(3)

The Analysis I

You should identify the major operations Then check percentage of these operations From the results you draw some conclusions Let’s consider a good example (from P2 of Davis Cho’s report in the UCLA 2019 course)

I will show good examples this year later

(4)

The Analysis II

(5)

The Analysis III

A typical analysis is as follows. In the forward procedure,

W^mmat(P_φ^mP_pad^m vec(Z^m,i))

=W^mφ(pad(Z^m,i))

φ(pad(Z^m,i)) : O(l × h^mh^md^ma_conv^m b^m_conv) W^mφ(·) : O(l × h^mh^md^md^m+1a^m_convb_conv^m )

Z^m+1,i = mat(P_pool^m,i vec(σ(S^m,i))) O(l × h^mh^md^m+1a^m_convb^m_conv)

(6)

The Analysis IV

From this complexity analysis matrix-matrix products are the bottleneck

Roughly,

d^m times more than others at layer m

However, we see that matrix-matrix products take less than half (feedforward part)

Thus optimized BLAS is very effective

It is easier to optimize the computationally heavy part

(7)

The Analysis V

But this also means that probably there is still room to improve other operations such as

φ(pad(Z^m,i))

(8)

Why Checking Main Steps is Important I

We see that some simply list the percentage of each function

feedforward ?? % max pooling ?? %

But the problem is that max pooling is part of feedforward

Thus you cannot say much from such a list of results

(9)

Single Versus Double I

By default Tensorflow uses single

Thus we should use single in MATLAB too

But some may forget to use the option -ftype to specify the use of single

(10)

Why is Tensorflow Faster? I

Before a rough answer of this issue, let’s discuss some differences between Tensorflow and our MATLAB-based implementation

Automatic differentiation Computational graphs

(11)

Automatic differentiation I

We will discuss this in regular lectures

(12)

Computational Graphs I

Let’s borrow a description from

https://deepnotes.io/tensorflow

“Tensorflow approaches series of computations as a flow of data through a graph with nodes being computation units and edges being flow of Tensors (multidimensional arrays).”

This means that we must build a graph first before the execution

This is a bit unnatural

For example, to do a matrix product

(13)

Computational Graphs II

C = A*B;

we cannot just write the above statement like in MATLAB.

We need two steps

But using a graph does have some advantages One is the effective parallell computation

Operations that are independent to each other (e.g., they need different input data) can be conducted in parallel

Therefore, operations can be scheduled in a more efficient manner.

(14)

Computational Graphs III

In contrast, our MATLAB code is a procedural setting

For example, in the feed forward process for function evaluation we have

S^m,i = W^mφ(pad(Z^m,i))_h^m_h^m_d^m_×a^m_conv_b^m_conv+b^m1^Ta^m_convb_conv^m , i = 1, . . . , l (1)

and

Z^m+1,i = mat(P_pool^m,i vec(σ(S^m,i)))_d^m+1_×a^m+1_b^m+1, i = 1, . . . , l (2) Remember that this is over all data (in a batch)

(15)

Computational Graphs IV

Thus if a graph has been constructed, easily (1)-(2) can be done in parallel

(16)

Why is Tensorflow Faster? I

Now let’s go back to this issue

We think there are some possible reasons Some MATLAB operations are not efficiently implemented

You have seen that index manipulation is time consuming

We will try to make improvements in the next project

Tensorflow’s setting by computational graph leads to better overall optimization?

(17)

Why is Tensorflow Faster? II

Tensorflow may have used some optimized packages dedicated to neural networks

For example, they may use

Intel MKL-DNN:

https://github.com/intel/mkl-dnn

(18)

Other Comments I

Please have both ID and name on the first page You should use the latex template

I have no choice but to give you 0 point as we have said that many times