Training Details - Related Work - 基於卷積神經網路與色彩感知的水彩混色模型

Related Work

4.1 Training Details

本文所使用的 model 與 loss function 都實作 Python 語言之上，並大量使用 Tensor-flow 上與 Keras 軟體庫。

TensorFlow [1] 是一個開源軟體庫，由 Google Brain 團隊開發，用於實作各種機器學習的專案。Keras 則是一個開源的神經網路軟體庫，由 python 寫成。能夠執行在 Tensorflow 上，並且提供一個友善, 快速且具延展性的測試環境，來實作深度神經網路的模型。

訓練演算法使用隨機梯度下降法 (Stochastic Gradient Descent,SGD) 之中的 ADAM Optimizer [14] 來做模型權重優化，optimizer 參數使用 ADAM 的預設參數，loss function 選用前一張所提出的 RGB Percoptual Loss。訓練途中，若 500 次未能降低 loss 則將學習率 (learning rate) 減半，若一連 5000 次 epoch 都無法降低 loss 則提早停止訓練 (Early Stopping)。這些參數選擇基於我們實驗中的發現：太長的訓練次數並不會幫助模型進一步優化。

最大訓練 epoch 次數為 10 萬次，batch size 為 200。Model 選取採用 training loss 判斷，當前 epoch 的 training loss 比已存取的 Model 更小時，則存取權重，此方法可以確保訓練過程中能存取到 loss 最小的 model。

硬體使用 Nvidia GTX1080Ti 顯示卡，該卡擁有 11G 的顯示記憶體，3584 個 CUDA 計算核心 (1480Mhz)。電腦搭配 8 核心 16 線程之 CPU(AMD Ryzen 1700, 3Ghz)，以及 32G 記憶體。訓練次數通常會在 10k 次左右停止，費時約為 3.5 小時。

下圖為不同訓練次數的收斂曲線圖。下兩圖分別為訓練途中 RGB Loss 與 Delta E 之收斂曲線。可以注意到兩個點:

1. 無論是 RGB Loss 與 Delta E 的收斂情況都相當類似：這證明同為基於 XYZ 色彩空間的三色刺激值空間，在收斂情況上也會非常相似。

2. 訓練次數長對結果影響有限：訓練次數常雖然代表有更多次的機會可以挑選最佳模型，但本文所用的 ResNet18 模型在訓練後期會穩定收斂到一定值，再增加訓練次數對降低 loss 幫助有限。

Figure 4.1: 圖為 RGB loss 收斂曲線，可以看到損失收斂曲線到了 8k 次訓練後已經到達飽和狀態。

Figure 4.2: 圖為 Delta E 收斂曲線，與同為基於 XYZ 空間的 RGB loss 有類似的收斂曲線。

4.2 Result

統計結果

我們的模型在測試集上預測出來的混色在 ∆E_Lab上可以達到平均 2.29，誤差極大值為 10.79。平均來說本文的結果已經接近了不易察覺色差的程度。

另外，在測試集上有 88.7% 的預測結果能達到 ∆E_Lab < 5，有 79.7% 的預測 結果能達到 ∆E_Lab< 3.5。

Figure 4.3: 圖為訓練後的模型在測試集合上的 ∆E_Lab 統計結果。

Figure 4.4: 圖為訓練後的模型在測試集合上的 ∆E_Lab的累積分布圖。

色彩比較

下列數張圖為測試集合中部分結果，並分成不同 ∆E_Lab的區間說明。如上一章所 述，∆E_Lab < 3.5 代表預測結果與真實混色之間有可察覺的色差，∆E_Lab< 5 代表有明顯色差。

Delta E < 2 : 下圖從左至右之 Delta E 為 0.58 , 1.0 , 1.93

Figure 4.5: Delta E < 2 之部分結果

Delta E < 3.5 : 下圖從左至右之 Delta E 為 2.38 , 3.02 , 3.37

Figure 4.6: Delta E < 3.5 之部分結果

Delta E < 5 : 下圖從左至右之 Delta E 為 3.59 , 4.33 , 4.72

Figure 4.7: Delta E < 5 之部分結果

5 < Delta E : 下圖從左至右之 Delta E 為 5.49 , 5.98 , 6.64

Figure 4.8: Delta E> 5 之部分結果

下圖從左至右之 Delta E 為 9.53 , 10.79 (為測試組中誤差最大者)

Figure 4.9: 測試集中之 worst case

RGB 顏色上的結果如下圖4.10所示，其中每個小色塊的左邊為 Ground Truth，右 邊為預測出來的混色結果。結果由左至右，由上至下，依 ∆E_Lab 數值由小至大排 列。由下往上數起第三排開始為 ∆E_Lab> 5 之結果，由下往上數起第五排開始則 為 ∆E_Lab> 3.5 之結果。

可以看到上半部大部分的結果都與真實混色相當接近，甚至是無法分辨兩色差異，說明本文所提出的混色模型確實是在視覺上接近真實混色。

Figure 4.10: 圖為訓練後的模型在測試集合上的 RGB 顏色上的結果。其中每個小色塊的左邊為 Ground Truth，右邊為預測出來的混色結果。結果由左至右，由上 至下，依 ∆E_Lab數值由小至大排列。

Chapter 5 Discussion

Limitation in Dark Colors

雖然在色彩上，大部分情況下貼近真實混色。但本文所提出的方法是預測混色的反射光譜，所以在低反射度的混色上容易受到一些波動 (Noise) 的影響，而產生或多或少的色差。低反射度的反射光譜由於數值偏低，稍有偏移就會在色彩上顯得較亮，如下圖5.1所示。

Figure 5.1: 此圖說明：暗色調混色容易因反射光譜的雜訊與飄移而造成色差。左圖左邊為實際混色，右邊為預測結果。右圖藍色線為實際混色光譜，橘色為預測結果。

New discovery on Predicted Re ectance Spectrum

本研究中另一個發現是：在預測的反射光譜上有波動的現象產生。

如下圖5.2所示，我們發現預測光譜會有抖動的現象。但即使有這樣大幅地抖動，不會影響在色彩空間中的準確度。從預測光譜可以發現，我們的方法所產生的光譜會偏重於 700nm 以下的區域，且在局部平均上與實際混色光譜類似。可能是我們的模型會使用色彩匹配的方式來在光譜上混色，如下圖場合即為：藍色波段加上少許綠色波段。

Reﬂectance Spectrum Color

Ground truth (55, 157, 206) Our Result (65, 159, 203)

Figure 5.2: An illustration of the output re ectance of our model. The curve between visible wavelengths (380 to 750 nm) mainly focus on the short-term average, while the curve beyond 750nm can be totally ignored (being invisible to human eye)

而下圖5.3所示，可以從以下白色預測光譜範例觀察出：本文提出的模型在光譜上的偏好，主要落在 550-650nm 與 400-500nm 區間。同樣的，即便在光譜上兩曲線差異甚大，不會影響在色彩空間中的準確度。

Figure 5.3: 此圖說明：白色混色在光譜上的偏重於三個區域。左圖左邊為實際混色，右邊為預測結果。右圖藍色線為實際混色光譜，橘色為預測結果。

Future Works

Zeiler et al. [2014] [27] 曾經透過特徵視覺化來幫助人們理解 AlexNet。若能將模型中的參數視覺化，或許能有助於我們理解上述的現象發生的原因。

改進混色模型是一個方向，若我們對模型的理解更加深入，或許能作出更好的混色模型。

應用方面，本文所提出水彩混色模型可以使用於未來的水彩繪畫模擬系統，

幫助藝術家更好的製作數位水彩畫。另一個方面的應用可以利用混色預測，來製作一個混色匹配系統，給定 RGB 色彩來計算可能的水彩顏料混色配方。

Chapter 6 Conclusion

本論文所提出的卷積神經網路 (CNN) 的模型，應用於水彩混色預測，搭配上色彩感知誤差的損失函式來訓練模型，確實在平均上能達到相當接近真實混色的結果。此模型透過色感誤差的損失函式訓練後所產生的結果，雖然會有中幅度的波動，無法貼近真實頻譜，但在色彩上依然能與真實混色貼近。在暗色情況下可能會產生較大色差，是未來研究可以改進的點。

本研究提出的新方法及其帶來的成果，預期能為未來對於格種顏料混色的研究與數位繪畫模擬有所貢獻。

Bibliography

[1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning.

In OSDI, volume 16, pages 265–283, 2016.

[2] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu. Convo-lutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 22(10):1533–1545, 2014.

[3] W. Baxter, J. Wendt, and M. C. Lin. Impasto: a realistic, interactive model for paint.

In Proc. of the 3rd international symposium on Non-photorealistic animation and rendering, pages 45–148. ACM, 2004.

[4] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wise training of deep networks. In Advances in neural information processing systems, pages 153–160, 2007.

[5] C. M. Bishop and C. Roach. Fast curve fitting using neural networks. Review of scientific instruments, 63(10):4450–4456, 1992.

[6] K. Chellapilla, S. Puri, and P. Simard. High performance convolutional neural net-works for document processing. In Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft, 2006.

[7] M.-Y. Chen, C.-S. Yang, and M. Ouhyoung. A smart palette for helping novice painters to mix physical watercolor pigments. In E. Jain and J. Kosinka, editors, Proc.

of EuroGraphics 2018, Posters,, page April 2018. The Eurographics Association, 2018.

[8] C. J. Curtis, S. E. Anderson, J. E. Seims, K. W. Fleischer, and D. H. Salesin.

Computer-generated watercolor. In Proceedings of the 24th annual confer-ence on Computer graphics and interactive techniques, pages 421–430. ACM Press/Addison-Wesley Publishing Co., 1997.

[9] P. Edström. Examination of the revised kubelka-munk theory: considerations of modeling strategies. JOSA A, 24(2):548–556, 2007.

[10] J. Guild et al. The colorimetric properties of the spectrum. Phil. Trans. R. Soc. Lond.

A, 230(681-693):149–187, 1931.

[11] C. S. Haase and G. W. Meyer. Modeling pigmented materials for realistic image synthesis. ACM Transactions on Graphics (TOG), 11(4):305–335, 1992.

[12] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.

In Proc. of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.

[13] G. E. Hinton, S. Osindero, and Y.-W. Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006.

[14] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[17] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989.

[18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[19] W. Mokrzycki and M. Tatol. Colour difference∆ e-a survey. MGV, 20(4):383–411, 2011.

[20] C. Poultney, S. Chopra, Y. L. Cun, et al. Efficient learning of sparse representations with an energy-based model. In Advances in neural information processing systems, pages 1137–1144, 2007.

[21] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[22] D. Steinkraus, I. Buck, and P. Simard. Using gpus for machine learning algorithms.

In Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on, pages 1115–1120. IEEE, 2005.

[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Van-houcke, A. Rabinovich, et al. Going deeper with convolutions. Cvpr, 2015.

[24] S. Westland, L. Iovine, and J. M. Bishop. Kubelka-munk or neural networks for computer colorant formulation? In 9th Congress of the International Colour Associ-ation, volume 4421, pages 745–749. International Society for Optics and Photonics, 2002.

[25] W. D. Wright. A re-determination of the trichromatic coefficients of the spectral colours. Transactions of the Optical Society, 30(4):141, 1929.

[26] S. Xu, H. Tan, X. Jiao, F. Lau, and Y. Pan. A generic pigment model for digital painting. In Computer Graphics Forum, volume 26, pages 609–618. Wiley Online Library, 2007.

[27] M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks.

In European conference on computer vision, pages 818–833. Springer, 2014.

在文檔中基於卷積神經網路與色彩感知的水彩混色模型 (頁 39-58)