When we adjust the parameters of the generator, we follow equation 7 to make sure we are in the right way.
KL(p(t+1)|pθ(t)) ≤ KL(qα(t)|pθ(t)) (7)
4.5 Calculate similarity
We will show our works generate higher similarity of picture by calculating histogram similarity[27, 28, 29], average hash algorithm(aHash)[30], perceptual hash algorithm(pHash)[30, 31, 32] and different hash algorithm(dHash)[30].
Histogram similarity An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image. It plots the number of pixels for each tonal value. By looking at the histogram for a specific image, viewers will be able to judge the entire tonal distribution at a glance.
To calculate the histogram of the image, we use OpenCV to help our work. OpenCV is an open-source computer vision and machine learning software library. First, we separate the source image in its three R, G and B planes by the OpenCV split function. Then calculate the histograms by using the OpenCV calcHist function. Finally, normalize the histogram so its values fall in the range indicated by the parameters we decided.
We will compare the histogram of images to analyze the difference of image. We use our models and cycleGAN model to train generator separately. We will collect 300 generated pictures per epoch from each model. We compare the difference between the last picture with all the other pictures and calculate the average value and the variance.
We consider that if the model gets a higher value. This means the model can generate higher diverse pictures.
Average hash algorithm(aHash) Average hash is the simplest algorithm that uses
‧
only a few transformations. Scale the image, convert to greyscale, calculate the mean and binarize the greyscale based on the mean. These integers are the hash of pictures. In our case, we want to generate a 64 bit hash of pictures.
The average hash algorithm first scaled-down the input image to 8×8 pixels, then converts the image to grayscale. Next, we calculated the average of all gray values of the image and then the pixels are examined one by one from left to right. If the gray value is larger than the average, a 1 is added to the hash, otherwise a 0.
After we get a 64 bit hash of pictures, we called this ”the footprint of the image”.
We use this hash value to compare how much difference between the two pictures. We compare the difference between the last picture with all the other pictures and calculate the average value and the variance. We consider that if the model gets a higher value.
This means the model can generate higher diverse pictures.
Perceptual hash algorithm(pHash) Perceptual hash uses a similar approach but instead of averaging relies on discrete cosine transformation (popular transformation in signal processing). Perceptive hash does the same as aHash, but first, it does a Discrete Cosine Transformation (equation 8) and works in the frequency domain.
The average hash algorithm first scaled-down the input image to 8*4×8*4, that is, a 32×32 image. Then converts the image to grayscale. To this image, we apply a discrete cosine transform (equation 8), first per row and afterward per column. The pixels with high frequencies are now located in the upper left corner, which is why we crop the image to the upper left 8×8 pixels. Next, we calculate the median of the gray values in this image and generate, analogous to the median hash algorithm, a hash value from the image.
Xk=
After we get a 64 bit hash of pictures, we called this ”the footprint of the image”.
We use this hash value to compare how much difference between the two pictures. We compare the difference between the last picture with all the other pictures and calculate the average value and the variance. We consider that if the model gets a higher value.
‧
This means the model can generate higher diverse pictures.
The average hash is simple, but it is strongly affected by the average. For example, when we do gamma correction or histogram equalization on an image, the mean affects the final hash value. PHash algorithm is less affected by gamma correction or histogram equalization than aHash. It used a discrete cosine transform (DCT)[33] to obtain the low-frequency components of the picture.
Difference hash algorithm(dHash)Difference hash uses the same approach as aHash, but instead of using information about average values, it uses gradients (the difference between adjacent pixels).
Similar to the average hash algorithm, the difference hash algorithm initially generates a grayscale image from the input image, which in our case is then scaled down to 9×8 pixels. From each row, the first 8 pixels are examined serially from left to right and compared to their neighbor to the right, which, analogous to the average hash algorithm, results in a 64-bit hash.
5 Experiments
In this section, we are going to demonstrate the experiment process and the results.
The following is the implementation of the cooperative learning model. We use different datasets to show the performance of our model. We use tensorflow for our works.
Our experiment environment is windows 10 with NVIDIA GTX 1080 GPU and Intel i5-7400 CPU. We build an Anaconda environment and install all python packages include TensorFlow-GPU 1.13.1, numpy 1.15.4, scipy 1.1.0, pillow 5.3.0 and pandas 0.23.4 in this environment.
You can find all experiment codes and details in our github [34].
5.1 Experiment 1: Generating bag texture patterns
In experiment 1, we want to let our generator model learn the bag texture patterns.
‧
we use edges2handbags [8] as our training dataset. We want our generator model to learn how to generate the bag texture patterns. The goal of this experiment is to the change picture of the bag sketch to look like a real bag.
We trained 300 epochs, 400 iterations for each epoch. We will input one sketch picture and one real bag picture. In CycleCoopnet and cycleGAN, we don’t need to pair the sketch and the real bag, but in pix2pix, because the model needs paired data. We need to give the model the bags with its sketch.
We set the generator Learning rate for 0.0001, descriptor Learning rate for 0.01, cycle consistency Learning rate for 0.0001. We set Langevin revision steps for 30 and Langevin step size for 0.002 in this experiment. The descriptor output dimension we will set 100 for this experiment.
Comparison of loss curve First, we compare the converging speed between cycle-GAN and our works. Table 2 shows the loss comparison with Cyclecycle-GAN and our work.
We use cycle consistency loss as our benchmark since cycle consistency loss can let us know the learning steps of the two generators. We can see our model converge in about 5 epochs, faster than CycleGAN need to converge in about 10 epochs. This shows our model has a faster co verge ability by using a supervised learning strategy.
There are four output pictures in our experiment 1, we use the picture below to explain what are these outputs.
Figure 13 shows the process and how we named the results of our experiment 1, ”sk”
means sketch pictures, ”R” means real pictures. The single prime notation ’ means gen-erated picture from the generator. Double prime notation ” means the picture recovered from the generated picture.
We can see Figure13 for the explanation for our experiment. We will show our results R’, which means the picture we generated. Then show our results sk’, which means the picture we recovered form generated pictures. Then we will see another side of Cycle-CoopNet, We will show our results sk”, which means the picture we generated. Then show our results R”, which means the picture we recovered form generated pictures.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
generator loss from sketch to bag
CycleGAN CycleCoopNet
generator loss from bag to sketch
CycleGAN CycleCoopNet
Table 2: The loss comparison table of the CycleGAN model and CycleCoopNet model
Comparison of R’ Here we are going to compare different models to generate the bag pictures from the bag sketch. We compared the model of CycleCoopNet, cycleGAN, and pix2pix. We use 10 datasets for every testing, and we choose 100 results of outputs and compare the difference between the real bag and the bag we generated. Figure 24 show some results generated by our CycleCoopNet, figure 25 show some results generated by cycleGAN and figure 26 show some results generated by pix2pix. You can find sample results in the last section.
We compare these results with the origin picture by two benchmarks, histogram and aHash, that we have already introduced in the previous section. Figure 14 shows the scatter plot for three different model results. We can see the points of pix2pix gather in
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 13: The process and how we named the results in experiment 1
Figure 14: The scatter plot of three different model doing the experiment for generate bags from sketches
‧
the upper right corner. This means the generated results look like the origin pictures. The reason is pix2pix use paired data in the training, whenever we update our parameters.
We use the correct real picture to adjust the parameters of the generator. After many epochs, pix2pix should generate the picture mostly closed to the original picture. On the other hand, CycleGAN and CycleCoopNet use the unpaired data in the training. This means they never see the real picture in all training processes.
Next, we compare the CycleGAN and CycleCoopNet, we can see our model get a similar score in the histogram benchmark. But in the aHash benchmark, our model gets a higher score than the CycleGAN. This shows our model can generate pictures more correctly.
Comparison of sk’ Here we are going to compare the recover results from the gen-erated bags of CycleCoopNet and cycleGAN. We do not compare the pix2pix because pix2pix is training by paired data and it does not recover the generated pictures to the original picture. We want to make sure that our results can be returned to the original picture. This can also prevent our generator model generating the irrational picture and let the generated pictures can not be returned to origin pictures. We use 10 datasets for every testing, and we choose 100 results of outputs and compare the difference between the real bag and the bag we generated.
We can see figure 15 for the result, the recover result shows the recoverability of our model a little worse than cycleGAN. But our recover results still get a score higher than 0.6 in the benchmark. We think this result is acceptable because our model generates a more diverse picture, this makes the recovery more difficult.
Comparison of sk” Here we are going to compare different model generate the sketches form the real bag. We compared the model of CycleCoopNet, cycleGAN, and pix2pix. We use 10 datasets for every testing, and we choose 100 results of outputs and compare the difference between the real bag and the bag we generated.
We can see figure 16 the result, our model and pix2pix get a similar score in two benchmarks. Pix2pix is training for by paired data it will have to be more stable cause
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 15: The scatter plot of two different model doing the experiment for recover the generated bags to origin sketches
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 16: The scatter plot of three different model doing the experiment for generate sketches from real bags
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 17: The scatter plot of two different model doing the experiment for recover the generated sketches to origin bags
it direct learning from the observed data. Our model only descriptor can learn from the observed data. And the generator learns from descriptor learning. However, we can have similar results with pix2pix. Shows our model is more stable by the MCMC teaching.
Comparison of R” Here we are going to compare the recover results from the gen-erated sketches of CycleCoopNet and cycleGAN. We do not compare the pix2pix because pix2pix is training by paired data and it does not recover the generated pictures to the original picture. We want to make sure that our results can be returned to the original picture. This can also prevent our generator model generating the irrational picture and let the generated pictures can not be returned to origin pictures. We use 10 datasets for every testing, and we choose 100 results of outputs and compare the difference between the real bag and the bag we generated.
We can see figure 17 the result, our model gets a higher score and accumulates in a small range. This also shows our model is more stable.
‧
國立 政 治 大 學
‧
N a tio na
l C h engchi U ni ve rs it y
Figure 18: Statistics plot of generating bags experiment
Figure 19: Statistics plot of generating sketches experiment
‧
model generated bags x recovered sketches
cyclecoopnet 1100/1100
cyclegan 1092/1100
model generated sketches x recovered bags
cyclecoopnet 1100/1100
cyclegan 871/1100
Table 3: Result of edges2handbags experiment
We compare these three models’ accuracy by training other CNN models to help us to tell the picture is for bags-class or edges-class. We use 11 cases of test data and each uses 100 pictures after the model has already trained. All test cases have not duplicated with our training data.
In the table 3, we show the average score of 1100 results graded by other trained CNN models. We can see our model gets a similar score with pix2pix, and better than cycleGAN. However, pix2pix use paired data to train models. Our model can get a similar score by using unpaired data. Figure 18 and Figure 19 sum results from all cases.
Figure 18 is the result of generating bags, cycleGAN and our works get similar scores.
Figure 18 is the result of generating sketches, we can see cycleGAN will fail in some cases.
Shows our model is more stability.