In Chapter 6, “Backpropagation Training,” we used the MNIST handwritten digits as an example of using backpropagation. In Chapter 10, we present an example about
improving our recognition of the MNIST digits, as a deep convolutional neural network.
The convolutional network, being a deep neural network, will have more layers than the feedforward neural network seen in Chapter 6. The hyper-parameters for this network are as follows:
Input: Accepts box of [1,96,96]
Convolutional Layer: filters=32, filter_size=[3,3]
Max-pool Layer: [2,2]
Convolutional Layer: filters=64, filter_size=[2,2]
Max-pool Layer: [2,2]
Convolutional Layer: filters=128, filter_size=[2,2]
Max-pool Layer: [2,2]
Dense Layer: 500 neurons Output Layer: 30 neurons
This network uses the very common pattern to follow each convolutional layer with a max-pool layer. Additionally, the number of filters decreases from the input to the output layer, thereby allowing a smaller number of basic features, such as edges, lines, and small shapes to be detected near the input field. Successive convolutional layers roll up these basic features into larger and more complex features. Ultimately, the dense layer can map these higher-level features into each x-coordinate and y-coordinate of the actual 15 digit features.
Training the convolutional neural network takes considerable time, especially if you are not using GPU processing. As of July 2015, not all frameworks have equal support of GPU processing. At this time, using Python with a Theano-based neural network
framework, such as Lasange, provides the best results. Many of the same researchers who are improving deep convolutional networks are also working with Theano. Thus, they promote it before other frameworks on other languages.
For this example, we used Theano with Lasange. The book’s example download may have other languages available for this example as well, depending on the frameworks available for those languages. Training a convolutional neural network for digit feature recognition on Theano took less time with a GPU than a CPU, as a GPU helps
considerably for convolutional neural networks. The exact amount of performance will vary according to hardware and platform. The accuracy comparison between the
convolutional neural network and the regular ReLU network is shown here:
Relu:
Best valid loss was 0.068229 at epoch 17.
Incorrect 170/10000 (1.7000000000000002%) ReLU+Conv:
Best valid loss was 0.065753 at epoch 3.
Incorrect 150/10000 (1.5%)
If you compare the results from the convolutional neural network to the standard
feedforward neural network from Chapter 6, you will see the convolutional neural network performed better. The convolutional neural network is capable of recognizing sub-features in the digits to boost its performance over the standard feedforward neural network. Of course, these results will vary, depending on the platform used.
Chapter Summary
Convolutional neural networks are a very active area in the field of computer vision.
They allow the neural network to detect hierarchies of features, such as lines and small shapes. These simple features can form hierarchies to teach the neural network to recognize complex patterns composed of the more simple features. Deep convolutional networks can take considerable processing power. Some frameworks allow the use of GPU processing to enhance performance.
Yann LeCun introduced the LeNET-5, the most common type of convolutional
network. This neural network type is comprised of dense layers, convolutional layers and max-pool layers. The dense layers work exactly the same way as traditional feedforward networks. Max-pool layers can downsample the image and remove detail. Convolutional layers detect features in any part of the image field.
There are many different approaches to determine the best architecture for a neural network. Chapter 8, “NEAT, CPPN and HyperNEAT,” introduced a neural network
algorithm that could automatically determine the best architecture. If you are using a feedforward neural network you will most likely arrive at a structure through pruning and model selection, which we discuss in the next chapter.
Chapter 11: Pruning and Model Selection
Pruning a Neural Network Model Selection
Random vs. Grid Search
In previous chapters, we learned that you could better fit the weights of a neural
network with various training algorithms. In effect, these algorithms adjust the weights of the neural network in order to lower the error of the neural network. We often refer to the weights of a neural network as the parameters of the neural network model. Some machine learning models might have parameters other than weights. For example, logistic
regression (which we discussed in Artificial Intelligence for Humans, Volume 1) has coefficients as parameters.
When we train the model, the parameters of any machine learning model change.
However, these models also have hyper-parameters that do not change during training algorithms. For neural networks, the hyper-parameters specify the architecture of the neural network. Examples of hyper-parameters for neural networks include the number of hidden layers and hidden neurons.
In this chapter, we will examine two algorithms that can actually modify or suggest a structure for the neural network. Pruning works by analyzing how much each neuron contributes to the output of the neural network. If a particular neuron’s connection to another neuron does not significantly affect the output of the neural network, the
connection will be pruned. Through this process, connections and neurons that have only a marginal impact on the output are removed.
The other algorithm that we introduce in this chapter is model selection. While pruning starts with an already trained neural network, model selection creates and trains many neural networks with different hyper-parameters. The program then selects the hyper-parameters producing the neural network that achieves the best validation score.