We will introduce our method in the following three parts:
• overview of the model
• network architecture
• main steps of the algorithms
3.1 Overview of the model
We separated two parts to introduce about our works, generator part, and descriptor part, we will call our model ”CycleCoopNet” in the following contents. We simply introduce our overview steps and outputs here. First, we start with the input of the label picture.
Second, we use a generator to generate an ”initial generated picture”. Then we generate
”revised generated picture” by using Langevin revision dynamics. Finally, we use our descriptor to generate ”described revised picture”.
Figure 2 show the step flow of all the output picture we generated.
3.2 Generator part of CycleCoopNet
First, we introduce our generator part. We will start from input our label picture, we will call this ”input label”, then we use our generator to generate an initial example of B from this real A image. We called this ”initial generated picture”.
Figure 3 show how we design our generator, this generator layers architecture is Ref-erence from pix2pix [8]. We have 8 layers for encoder parts and 8 layers for decoder parts.
The layer using the convolution-BatchNorm-ReLU concept, means in every layer, we will do convolution first, then do Batch-Normalization for every variable. Finally, we will do ReLU to or Leaky ReLU to adjust parameters.
3.2.1 Batch-Normalization
In our model, we use Batch-Normalization method [18] to control our weight variables value will not be too divergent. Since our model has lots of variables in every layer. By
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 2: The overview flow of our model
using batch normalization, we can avoid some higher weight values affect results more than others. With batch normalization, we allow using higher learning rates to finish our tasks using less time. Additionally, we can less careful about the initial value of our model. In some cases, batch-normalization reduces the need for a dropout method.
3.2.2 Activation function
We separate our generator into two parts, encoder, and decoder. Since our target is changing the label picture to the real scene picture. We consider in encoder part, we try to compress our information into a vector. We hope we can remain the feature of the pictures in a vector, then use this vector to reconstruct a real picture in the decoder part.
So we consider ReLU, Leaky ReLU to be our activation function.
In our generator model, we use ReLU, Leaky ReLU, as our activation function. An activation function is using to let our calculation to be a nonlinear equation. Simple convolution will let our model be a linear equation, it means the machine can not learn
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 3: Generator layers
‧
anything with doing function one by one matching. We use leaky ReLU in our generator-encoder layers, leaky ReLU has all the advantages of ReLU, leaky ReLU remains all the value less than zero with multiply these numbers with a parameter. In the encoder part, we hope we can remain more features of our original picture, the first we should consider is to keep all features as possible, so we use leaky ReLU. Let all the values less than zero can save to the next steps. Different from leaky ReLU, ReLU chooses to let all the values less than zero to be zero. We use ReLU in the generator-decoder layers, we wish it can help us to decrease the redundant values of the model, that the picture regeneration to be more clear.
In the last layer of the generator, we use hyperbolic tangent as our activation function.
This can let all variables be remained and distributed to the range between zero to one.
Finally, we will get our output picture. We use the above concept to build our generator structure.
3.2.3 Dropout
Since we have lots of variables in our generator layers. In the decoder part, we add dropout function in every layer with a fifty-percent dropout rate. the reason why we only set in the decoder part is that we wish we can remain all special features as possible in the encoder part. In the decoder part, first, we only need to regenerate pictures by lots of weight variables. To decrease the amount of the calculation, we could randomly drop out some weight value so our calculation could be faster. Second, we will do concatenation in step. This means even if we drop out the important feature values, we could fund this in the layers concatenation part again. In other words, this also makes our important feature values could be highly weighted.
3.2.4 Skip connection and layers concatenation
In our generator model, we add the concatenation method when we doing the generator decode part. The main task of the decoder part is to reconstruct the picture and wish it
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
can give us a picture like the real picture. Skip connection is used to avoid the bottleneck of information, concatenating with previous layers result. We can also find features that we might lose in the previous layers. Our concept followed by the “U-Net” in [19]. With skip connection, our generator layers can be more strictly. Figure 4 shows the concept of layers concatenation.
Figure 4: Layers concatenation concept
3.2.5 Details of generator
Figure 5 shows the detail design of our generator. We combine all of the concepts we mentioned above. In the encoder part, add Batch-Normalization after convolution layers, then do Leaky ReLU as the activation function for the final result of one layer.
In the decoder part, add Batch-Normalization after transposed-convolution layers.
Next, dropout some results by dropout rate. Then, concatenate with encoder layers result avoid information bottleneck or feature missing. Finally, do ReLU as the activation
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 5: Details of Generator layers
function for the final result of one layer.