Chapter 4 Experimental Results
4.5 Total Runtime
Table III shows the comparison of total runtime. In Table III (a), we use random uniform generated benches, and we generate four benches with different dim. The 32, 256, 1024 and 2048 dim can be used to represent the low, medium, high and very high dim respectively. In the cases of 32 and 1024 dim, our implementation reduce the total runtime up to 19% when comparing with the GPU-FNN implementation. The main reason is that GP-FNN cannot achieve a good hardware utilization of GPGPU in these two cases. But the ATM in our approach can always control the hardware utilization in a very high level. However, the performance improvement in the case of 256 dim is only 6%, because the hardware utilization of GPU-FNN is also really high, so there is no room to gain benefit from the hardware utilization. The 2048 dim is a special case. In this case, the GPU-FNN cannot handle the very high dim. However, our approach still works normally, and has 443x speedup over CPU implementation. Table III (b) shows real cases, and the improvement can be up to 11.96%
with isolet5 bench compared to GPU-FNN. We can notice that the improvements of total runtime are smaller than the improvements of the kernel time experiments before. The main reason is the number of rules is increasing from 0, and the parallelism is not large enough to have massive improvement with small number of rules. As a result, the overall performance improvement degrades to about 50% of the kernel time improvement in average.
43
Table III Total runtime
(a) Synthetic benches
(b) Real benches
44
Chapter 5
Conclusions & Future Works
In this thesis, we present a design flow for parallel FNNs on GPGPUs. In the design flow, we propose the architecture-aware thread mapping (ATM) methodology to optimize each CUDA kernel. The task decomposition and coarsening scheme scans the design space of a parallel FNN. By considering different characteristics of FNNs and training samples, the proposed scheme can find appropriate parallelism which can fully exploit the computing capability of GPGPUs. Moreover, the task to thread binding maps the high level tasks to the concurrent threads. This binding methodology concerns not only the architectural features of GPGPUs, but also the characteristics of FNNs such as dim. The proposed binding methodology from tasks to thread provides performance scalability with the increasing number of cores of GPGPUs and changing dim and rules of FNNs.
Experimental results show that the kernel time can be reduced by 20%~40%, and the reduction of total runtime is up to 20% compared with the GPU-FNN. Compared with the CPU implementation, the total runtime speedup can be up to 460X. As a result, the proposed ATM methodology makes it more practical to apply an FNN to solve different problems. And the ATM methodology further accelerates the performance over the GPU-FNN in some cases.
45
References
[1] Y. Cai and H. K. Kwan, “A fuzzy neural classifier for pattern classification,” in Proc. Int. Symp.
Circuits Systems, Chicago, IL, May 3–6, pp. 2367–2370, 1993.
[2] J. S. Jang, “ANFIS: Adaptive-network-based fuzzy inference system,” IEEE Trans. Syst.,
Man, Cybern., vol. 23, no. 3, pp. 665–685, May 1993.
[3] C.F. Juang and C.T. Lin, “An on-line self-constructing neural fuzzy inference network and its applications,” IEEE Trans. Fuzzy System, vol. 6, no. 1, February 1998.
[4] D. Kukolj and E. Levi, “Identification of complex systems based on neural and Takagi–
Sugeno fuzzy model,” IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 34, no. 1, pp.
272–282, February 2004.
[5] N. K. Kasabov and Q. Song, “DENFIS: Dynamic evolving neural-fuzzy inference system and its application for time-series prediction,” IEEE Trans. Fuzzy Syst., vol. 10, no. 2, pp.
144–154, April 2002.
[6] P.P. Angelov and D. P. Filev, “An approach to online identification of Takagi–Sugeno fuzzy models,” IEEE Trans. Syst., Man Cybern., B, Cybern., vol. 34, no. 1, pp. 484–498, February 2004.
[7] P. P. Angelov and D. P. Filev, “Simpl_eTS: A simplified method for learning evolving Takagi–Sugeno fuzzy models,” in Proc. Int. Conf. Fuzzy Syst., pp. 1068–1072 , 2005.
[8] H. J. Rong, N. Sundararajan, G. B. Huang, and P. Saratchandran, “Sequential adaptive fuzzy inference system (SAFIS) for nonlinear system identification and prediction,” Fuzzy
Sets Syst., vol. 157, no. 9, pp. 1260–1275, 2006.
[9] P. Angelov and X. Zhou, “Evolving fuzzy systems from data streams in real-time,” in
Proc. Symp. Evolving Fuzzy Syst., pp. 29–35, 2006.
[10] C. F. Juang and Y. W. Tsao, “A self-evolving interval type-2 fuzzy neural network
46
with on-line structure and parameter learning,” IEEE Trans Fuzzy Syst., vol. 16, no. 6, pp.
1411–1424, December 2008.
[11] J. D. Rubio, “SOFMLS: Online self-organizing fuzzy modified leastsquares network,”
IEEE Trans. Fuzzy Syst., vol. 17, no. 6, pp. 1296–1309, December 2009.
[12] J. A. M. HernandezmF.G. Castaneda and J. A. M. Cadenas, “An evolving fuzzy neural network based on the mapping of similarities,” IEEE Trans Fuzzy Syst., vol. 17, no. 6, pp.
1379–1396, December. 2009.
[13] J. J. Rubio and J. Pacheco, “A stable online clustering fuzzy neural network for nonlinear systems identification,” Neural Comput. Appl., vol. 18, no. 6, pp. 633–641, 2009.
[14] J. A. Iglesias, P. Angelov, A. Ledezma, and A. Sanchis, “Evolving classification of agents’ behavior: A general approach,” Evolving Syst., vol. 1, no. 3, pp. 161–171, 2010.
[15] J. J. Rubio, D. M. V´azquez, and J. Pacheco, “Backpropagation to train an evolving radial basis function neural network,” Evolving Syst., vol. 1, no. 3, pp. 173–180, 2010.
[16] J. J. Rubio Avila, “Stability analysis for an online evolving neuro-fuzzy recurrent neural network,” in Evolving Intelligent Systems: Methodology and Applications, P.
Angelov, D. P. Filev, and N. Kasabov, Eds. New York: Wiley-IEEE Press, ch. 8, pp. 173–
198, 2010.
[17] C.F. Juang and T.C. Chen, “Speedup of implementation fuzzy neural networks with high-dimensional inputs through parallel processing on graphic processing units,” IEEE Trans.
Fuzzy System, vol. 19,no. 4, August 2011.
[18] NVIDIA. “NVIDIA ‘s next generation CUDA compute architecture: Fermi,” Available:
http://www.nvidia.com.tw/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_A rchitecture_Whitepaper.pdf
[19] NVIDIA. CUDA. (2011). [Online]. Available at
47
http://www.nvidia.com/object/cuda_home_new.html
[20] NVIDIA. “CUDA C Programming Guide,” Available:
http://developer.nvidia.com/nvidia-gpu-computing-documentation
[21] NVIDIA. “CUDA C best practices guide,” Available:
http://developer.nvidia.com/nvidia-gpu-computing-documentation
[22] K.S Kyong and K. Jung. “GPU implementation of neural network”, Pattern
Recognition, Vol. 37, Issue 6, pp. 1311-1314, 2004.
[23] X. Sierra-Canto,F. Madera-Ramirez and V. “Parallel training of a back-propagation neural network using CUDA,” Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010, pp 307-312, 2010.
[24] Mart´ınez-Zarzuela, M., D´ıaz Pernas, F., D´ıez Higuera, J., Ant´on Rodr´ıguez, M.
“Fuzzy ART neural network parallel computing on the gpu,” Sandoval, F. (ed.) IWANN 2007. LNCS, vol. 4507, pp. 463–470. Springer, Heidelberg, 2007.
[25] Machine Learning data set: “Artificial characters” Available: