The mnCNN Model - Code of The Models - 消除深度學習目標函數中局部極小值之研究

Code of The Models

A.5 The mnCNN Model

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

model.add(Dense(10))

model.add(Activation(’softmax’))

model.compile(loss = ’mse’, optimizer = SGD(lr=0.05), metrics = [’accuracy’])

model.summary()

model.fit(x_train, y_train, batch_size = 100, epochs = 12) model.evaluate(x_test, y_test)

A.5 The mnCNN Model

The code of the modified normalized convolutional neural network that we construct in chapter 7 is as shown below.

% env KERAS_BACKEND = tensorflow

% matplotlib inline import numpy as np

import matplotlib.pyplot as plt from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.reshape(60000, 28, 28, 1)

x_test = x_test.reshape(10000, 28, 28, 1) x_train = x_train/255

x_test = x_test/255

from keras.utils import np_utils

y_train = np_utils.to_categorical(y_train, 10) y_test = np_utils.to_categorical(y_test, 10)

from keras.layers import Dense, Activation, Flatten

from keras.layers import Conv2D, MaxPooling2D, BatchNormalization from keras.optimizers import SGD

from keras import backend as K

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

from keras.models import Model from keras.layers import Input

f_1 = Conv2D(32, (3,3), padding=’same’) f_2 = BatchNormalization()

f_3 = Activation(’relu’)

f_4 = MaxPooling2D(pool_size=(2,2)) f_5 = Conv2D(64, (3,3), padding=’same’) f_6 = BatchNormalization()

f_7 = Activation(’relu’)

f_8 = MaxPooling2D(pool_size=(2,2))

f_9 = Conv2D(128, (3,3), padding=’same’) f_10 = BatchNormalization()

f_11 = Activation(’relu’)

f_12 = MaxPooling2D(pool_size=(2,2)) f_13 = Flatten()

f_14 = Dense(200, activation=’relu’) f_15 = Dense(10, activation=’softmax’) f_16 = Flatten()

f_17 = Dense(10)

f_18 = BatchNormalization()

f_19 = Activation(’exponential’)

from keras.engine.topology import Layer

from keras.engine.base_layer import InputSpec

class MyLayer(Layer):

def __init__(self, output_dim, **kwargs):

self.output_dim = output_dim

self.input_spec = InputSpec(min_ndim=2) super(MyLayer, self).__init__(**kwargs)

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

def build(self, input_shape):

if input_shape[-1] is None:

raise ValueError(’Error.’)

self.input_spec = InputSpec(ndim=len(input_shape), axes=dict(list(enumerate(input_shape[1:], start=1))))

self.kernel = self.add_weight(name=’kernel’,

shape=input_shape[1:], initializer=’uniform’, trainable=True)

super(MyLayer, self).build(input_shape)

def call(self, x):

return np.multiply(x,self.kernel)

def compute_output_shape(self, input_shape):

return (input_shape)

f_20 = MyLayer(10)

x = Input(shape=(28,28,1)) h_1 = f_1(x)

h_2 = f_2(h_1) h_3 = f_3(h_2) h_4 = f_4(h_3) h_5 = f_5(h_4) h_6 = f_6(h_5) h_7 = f_7(h_6) h_8 = f_8(h_7) h_9 = f_9(h_8) h_10 = f_10(h_9) h_11 = f_11(h_10)

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

h_12 = f_12(h_11) h_13 = f_13(h_12) h_14 = f_14(h_13) h_15 = f_15(h_14) z_1 = f_16(x) z_2 = f_17(z_1) z_3 = f_18(z_2) z_4 = f_19(z_3) z_5 = f_20(z_4)

from keras.layers import concatenate, add import keras

y = keras.layers.Add()([h_15, z_5]) model = Model(x, y)

model.summary()

weight = model.get_weights()[28]

import tensorflow as tf def MyLoss(y_true, y_pred):

return K.mean(K.cast(K.square(y_pred - y_true),’float32’), axis=-1) +0.01*tf.nn.l2_loss(weight)

model.compile(loss = MyLoss, optimizer = SGD(lr=0.1), metrics = [’accuracy’])

model.fit(x_train, y_train, batch_size = 100, epochs = 12) model.evaluate(x_test, y_test)

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

Bibliography

[1] Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, pages II–1908–II–1916.

JMLR.org, 2014.

[2] Avrim L. Blum and Ronald L. Rivest. Training a 3-node neural network is np-complete.

Neural Networks, 5(1):117 – 127, 1992.

[3] Alon Brutzkus and Amir Globerson. Globally optimal gradient descent for a convnet with gaussian inputs. 02 2017.

[4] Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 103–111, Doha, Qatar, October 2014. Association for Computational Linguistics.

[5] Anna Choromanska, Mikael Henaff, Michael Mathieu, Gerard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. Journal of Machine Learning Research, 38:192–204, 2015.

[6] Simon S. Du and Jason D. Lee. On the power of over-parametrization in neural networks with quadratic activation. CoRR, abs/1803.01206, 2018.

[7] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179 – 211, 1990.

[8] Rong Ge, Jason D. Lee, and Tengyu Ma. Learning one-hidden-layer neural networks with landscape design. CoRR, abs/1711.00501, 2017.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[9] Surbhi Goel and Adam R. Klivans. Learning depth-three neural networks in polynomial time. CoRR, abs/1709.06010, 2017.

[10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

[11] Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:

1308.0850, 2013.

[12] Moritz Hardt and Tengyu Ma. Identity matters in deep learning. CoRR, abs/1611.04231, 2017.

[13] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.

[14] Kenji Kawaguchi. Deep learning without poor local minima. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 586–594. Curran Associates, Inc., 2016.

[15] Kenji Kawaguchi and Yoshua Bengio. Depth with nonlinearity creates no bad local minima in resnets. arXiv preprint arXiv:1810.09038, 2018.

[16] Kenji Kawaguchi, Jiaoyang Huang, and Leslie Pack Kaelbling. Effect of depth and width on local minima in deep learning. CoRR, abs/1811.08150, 2018.

[17] Kenji Kawaguchi and Leslie Pack Kaelbling. Elimination of all bad local minima in deep learning. CoRR, abs/1901.00279, 2019.

[18] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.

[19] Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.

[20] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):

436, 2015.

[21] Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[22] Yuanzhi Li and Yang Yuan. Convergence analysis of two-layer neural networks with relu activation. CoRR, abs/1705.09886, 2017.

[23] Shiyu Liang, Ruoyu Sun, Jason D. Lee, and Rayadurgam Srikant. Adding one neuron can eliminate all bad local minima. Advances in Neural Information Processing Systems, 2018-December:4350–4360, 1 2018.

[24] Katta G. Murty and Santosh N. Kabadi. Some np-complete problems in quadratic and nonlinear programming. Mathematical Programming, 39(2):117–129, Jun 1987.

[25] Quynh Nguyen and Matthias Hein. Optimization landscape and expressivity of deep CNNs. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3730–3739, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018.

PMLR.

[26] Quynh N. Nguyen and Matthias Hein. The loss surface of deep and wide neural networks.

CoRR, abs/1704.08045, 2017.

[27] Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.

[28] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.

[29] Hanie Sedghi and Anima Anandkumar. Provable methods for training neural networks with sparse connectivity. arXiv preprint arXiv:1412.2693, 2014.

[30] Ohad Shamir. Are resnets provably better than linear predictors? CoRR, abs/1804.06739, 2018.

[31] Claude Elwood Shannon. A mathematical theory of communication. Bell system technical journal, 27(3):379–423, 1948.

[32] Mahdi Soltanolkotabi. Learning relus via gradient descent. CoRR, abs/1705.04591, 2017.

[33] Daniel Soudry and Elad Hoffer. Exponentially vanishing sub-optimal local minima in multilayer neural networks, 2018.

‧ 國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立政治大學

‧

N a tio na

l C h engchi U ni ve rs it y

[34] Paul J Werbos et al. Backpropagation through time: what it does and how to do it.

Proceedings of the IEEE, 78(10):1550–1560, 1990.

[35] Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, and Inderjit S. Dhillon. Recovery guarantees for one-hidden-layer neural networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 4140–4149, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR.

在文檔中消除深度學習目標函數中局部極小值之研究 - 政大學術集成 (頁 74-81)

The mnCNN Model

Code of The Models

A.5 The mnCNN Model

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

A.5 The mnCNN Model

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

Bibliography

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧ 國

立 政 治 大 學

‧

N a tio na

l C h engchi U ni ve rs it y

‧

國

立 政 治 大 學

‧

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學

立政治大學