• 沒有找到結果。

Discussion on the Project of Analyzing Running Time of Stochastic Gradient Methods

N/A
N/A
Protected

Academic year: 2022

Share "Discussion on the Project of Analyzing Running Time of Stochastic Gradient Methods"

Copied!
18
0
0

加載中.... (立即查看全文)

全文

(1)

Discussion on the Project of Analyzing Running Time of Stochastic Gradient

Methods

(2)

Some General Comments I

You should analyze your results

Some just present tables or figures, but we want more insight

(3)

The Analysis I

You should identify the major operations Then check percentage of these operations From the results you draw some conclusions Let’s consider a good example (from P2 of Davis Cho’s report in the UCLA 2019 course)

I will show good examples this year later

(4)

The Analysis II

(5)

The Analysis III

A typical analysis is as follows. In the forward procedure,

Wmmat(PφmPpadm vec(Zm,i))

=Wmφ(pad(Zm,i))

φ(pad(Zm,i)) : O(l × hmhmdmaconvm bmconv) Wmφ(·) : O(l × hmhmdmdm+1amconvbconvm )

Zm+1,i = mat(Ppoolm,i vec(σ(Sm,i))) O(l × hmhmdm+1amconvbmconv)

(6)

The Analysis IV

From this complexity analysis matrix-matrix products are the bottleneck

Roughly,

dm times more than others at layer m

However, we see that matrix-matrix products take less than half (feedforward part)

Thus optimized BLAS is very effective

It is easier to optimize the computationally heavy part

(7)

The Analysis V

But this also means that probably there is still room to improve other operations such as

φ(pad(Zm,i))

(8)

Why Checking Main Steps is Important I

We see that some simply list the percentage of each function

feedforward ?? % max pooling ?? %

But the problem is that max pooling is part of feedforward

Thus you cannot say much from such a list of results

(9)

Single Versus Double I

By default Tensorflow uses single

Thus we should use single in MATLAB too

But some may forget to use the option -ftype to specify the use of single

(10)

Why is Tensorflow Faster? I

Before a rough answer of this issue, let’s discuss some differences between Tensorflow and our MATLAB-based implementation

Automatic differentiation Computational graphs

(11)

Automatic differentiation I

We will discuss this in regular lectures

(12)

Computational Graphs I

Let’s borrow a description from

https://deepnotes.io/tensorflow

“Tensorflow approaches series of computations as a flow of data through a graph with nodes being computation units and edges being flow of Tensors (multidimensional arrays).”

This means that we must build a graph first before the execution

This is a bit unnatural

For example, to do a matrix product

(13)

Computational Graphs II

C = A*B;

we cannot just write the above statement like in MATLAB.

We need two steps

But using a graph does have some advantages One is the effective parallell computation

Operations that are independent to each other (e.g., they need different input data) can be conducted in parallel

Therefore, operations can be scheduled in a more efficient manner.

(14)

Computational Graphs III

In contrast, our MATLAB code is a procedural setting

For example, in the feed forward process for function evaluation we have

Sm,i = Wmφ(pad(Zm,i))hmhmdm×amconvbmconv+bm1Tamconvbconvm , i = 1, . . . , l (1)

and

Zm+1,i = mat(Ppoolm,i vec(σ(Sm,i)))dm+1×am+1bm+1, i = 1, . . . , l (2) Remember that this is over all data (in a batch)

(15)

Computational Graphs IV

Thus if a graph has been constructed, easily (1)-(2) can be done in parallel

(16)

Why is Tensorflow Faster? I

Now let’s go back to this issue

We think there are some possible reasons Some MATLAB operations are not efficiently implemented

You have seen that index manipulation is time consuming

We will try to make improvements in the next project

Tensorflow’s setting by computational graph leads to better overall optimization?

(17)

Why is Tensorflow Faster? II

Tensorflow may have used some optimized packages dedicated to neural networks

For example, they may use

Intel MKL-DNN:

https://github.com/intel/mkl-dnn

(18)

Other Comments I

Please have both ID and name on the first page You should use the latex template

I have no choice but to give you 0 point as we have said that many times

參考文獻

相關文件

For example, here are the weights of running mnist with the following parameters (scripts and results were generated by our TAs)?. python3 script.py --optim SGD --bsize 256 --C

THE SOLUTION OF FINAL OF ALGEBRA1. If the order of ba < n, then the order of ab

6 《中論·觀因緣品》,《佛藏要籍選刊》第 9 冊,上海古籍出版社 1994 年版,第 1

32.Start from the definition of derivative, then find the tangent line equation.. 46.Find the product of their slopes at there

In an oilre nery a storage tank contains 2000 gallons of gasoline that initially has 100lb of an additive dissolved in it. In preparation for winter weather, gasoline containing 2lb

You are given the wavelength and total energy of a light pulse and asked to find the number of photons it

The difference in heights of the liquid in the two sides of the manometer is 43.4 cm when the atmospheric pressure is 755 mm Hg.. 11) Based on molecular mass and dipole moment of

In the past, studies on the impact of information disclosure transparency and corporate governance on company value and operating performance have resulted in inconsistent