本研究為了改善人形偵測系統(Human Detection System)在個人電腦(PC-based) 上體積大、耗電量高、散熱問題、成本高等因素,並使本系統在應用上,可以被 內建於小型攝影機或監視器上。因此主要目標為將人形偵測系統脫離基於個人電 腦的運算環境並移轉到 ARM 嵌入式平台(Embedded Platform)上,但是在 ARM 嵌 入式平台上的純軟體運算速度過於緩慢,為了加速其運算速度,成功的設計了一 個基於 FPGA 的 HOG 硬體加速器(FPGA based HOG Accelerator)。且我們除了使用 FPGA 將關鍵的 HOG 硬體化完成 HOG Accelerator 之外,為了在不影響偵測率的 情況之下減少系統的運算量,也提出了一個經過改良的人形偵測演算法,即 HOG-AdaBoost-LFV 人形分類器。另外也成功的引入熱區圖(Heat Map)作為人群駐 留或人流分析,使用這種方式進行分析可以減少大量人力成本,只需將影片以系 統運算後即可得到結果,而不需要以人力長時間進行影片的監看。
經過靜態資料庫實驗(CBCL)的結果,我們的軟硬體協同設計之人形偵測系統,
即 HOG Accelerator 結合 HOG-AdaBoost-LFV 人形分類器與原始 Dalal 人形分類器 比較之下的偵測準確率(Accuracy)僅有±0.2%~±0.6%的錯誤率,代表經過我們的加 速設計後,影響其偵測率(Detection Rate)的變化非常小。在運作速度上,我們所完 成的人形偵測系統每秒鐘可以處理約 1051 個偵測視窗(Detecting Window),相較於 Dalal 純軟體的方法每秒鐘可以處理 80 個偵測視窗,我們軟硬體整合的方法可以節 省約 13 倍的運算時間。再加上引入背景切割(Background Segmentation)的功能,整 個整合軟硬體協同設計之嵌入式即時人形偵測系統約能達到 10~15fps 的運算效 率。
本系統未來的改善方向上,除了 FPGA 一次計算多個 HOG 的構想之外,就是 資料傳輸速度上的問題。目前所採用的嵌入式平台由於 ARM 與 FPGA 分屬不同晶 片,之間採用 DMA 介面,傳輸速度約為 33MB/s,我們系統最大的瓶頸是在 ARM 與 FPGA 之間的資料傳輸速度,FPGA 運算時間僅僅只佔 HOG 硬體運算模組總時
120
間的 1 成,資料搬移與傳輸時間佔了近 9 成。若能夠將之整合為一個 SoC (System on Chip)晶片,於資料搬移與傳輸時間上應可大幅的降低。而 FPGA 大廠 Xilinx 也 已經於近期推出 Zynq-7000 系列產品,晶片內部包含 ARM 核心與 FPGA 電路,以 方便開發者進行 ARM 與 FPGA 的整合開發,這恰巧證明了我們的軟硬體協同設計 的研究方向是正確的。其 ARM 與 FPGA 之間採用 AXI 介面連接,速度高達 600~1200MB/s,傳輸速度比我們目前所使用的嵌入式平台快上 20~40 倍,經過估 算,若未來使用基於該系列產品的開發板,在我們研究的架構之下,運算效率有 望從原先每秒 1051 個更進一步提昇到 6146~7044 個。
121
參考文獻
[1] T. Zhao, R. Nevatia, and B. Wu, "Segmentation and Tracking of Multiple Humans in Crowded Environments," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30, no. 7, pp. 1198-1211, July 2008.
[2] L. Wang and N.H.C. Yung, "Three-Dimensional Model-Based Human Detection in Crowded Scenes," IEEE Transactions on Intelligent Transportation Systems, Vol.13, no.2, pp.691-703, June 2012.
[3] 張峰嘉,「整合多重特徵之人群切割」,國立高雄第一科技大學電腦與通訊工 程系,碩士論文,中華民國台灣,2012。
[4] W. Ge, R.T. Collins, and R.B. Ruback, "Vision-Based Analysis of Small Groups in Pedestrian Crowds," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, no. 5, pp. 1003-1015, May 2012.
[5] X. Song and H.B.L. Duh, "A Simulation of Bonding Effects and Their Impacts on Pedestrian Dynamics," IEEE Transactions on Intelligent Transportation Systems, Vol. 11, no. 1, pp. 153-161, March 2010.
[6] D. Geronimo, A. M. Lopez, A. D. Sappa, and T. Graf, "Survey of Pedestrian Detection for Advanced Driver Assistance Systems," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 32, pp.1239-1258, 2010.
[7] J. Candamo, M. Shreve, D.B. Goldgof, D.B. Sapper, and R. Kasturi,
"Understanding Transit Scenes: A Survey on Human Behavior-Recognition Algorithms," IEEE Transactions on Intelligent Transportation Systems, Vol. 11, no.
1, pp. 206-224, March 2010.
[8] G. Zang and Y. Wang, "Optimizing Minimum and Maximum Green Time Settings for Traffic Actuated Control at Isolated Intersections," IEEE Transactions on Intelligent Transportation Systems, Vol. 12, no. 1, pp. 164-173, March 2011.
[9] G. Garcia-Bunster, M. Torres-Torriti, and C. Oberli, "Crowded pedestrian counting at bus stops from perspective transformations of foreground areas," Computer Vision, IET, Vol. 6, Issue 4, pp. 296-305, 2012.
[10] A. Shende, M. P. Singh, and P. Kachroo, "Optimization-based feedback control for pedestrian evacuation from an exit corridor," IEEE Trans. Intell. Transp. Syst., Vol.
12, no. 4, pp. 1167–1176, Dec. 2011.
122
[11] N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection,"
Proc. IEEE Conf. Computer Vision and Pattern Recognition, Vol. 1, pp. 886-893, 2005.
[12] Tam P. Cao, Darrell Elton, Guang Deng, "Fast buffering for FPGA implementation of vision-based object recognition systems," Journal of Real-Time Image Processing, Vol. 7, Issue 3, pp. 173-183, 2012.
[13] T. Moranduzzo and F. Melgani, "Detecting Cars in UAV Images With a Catalog-Based Approach," IEEE Transactions on Geoscience and Remote Sensing, Vol.52, Issue 10, pp. 6356-6367, 2014.
[14] S. Madhogaria, P.M. Baggenstoss, M. Schikora, W.Koch, and D.Cremers, "Car detection by fusion of HOG and causal MRF," IEEE Transactions on Aerospace and Electronic Systems, Vol. 51, Issue 1, pp. 575-590, 2015.
[15] Y. Shimizu and Y. Yamada," Automatic Capturing System of cage traps for medium-sized destructive animals: Reduction Method of Unintentionally Catch,"
Proceedings of SICE Annual Conference (SICE), pp. 1635-1640, 2013.
[16] W. Zhang, J. Sun, and X. Tang, "Cat Head Detection - How to Effectively Exploit Shape and Texture Features," European Conference on Computer Vision (ECCV), pp. 802-816, 2008.
[17] Raspberry Pi, https://www.raspberrypi.org
[18] P. Y. Hsiao, C. W. Yeh, S. S. Huang, and L. F. Fu, "A Portable Vision-Based Real-Time Lane Departure Warning System: Day and Night" , IEEE Trans. on Vehicular Technology , Vol.58 , no.4 , pp. 2089-2094 ,2009.
[19] P. Y. Hsiao, C. H. Chen, H. Wen and S. J. Chen, "Real-time realisation of noise-immune gradient-based edge detector," IEE Proc.-Comput. Digit. Tech., Vol.
153, No. 4, pp. 261-269, July 2006.
[20] Zheng Ding, Feng Zhao, Wei Shu, and Min-You Wu, "Face detection system for SVGA source with hecto-scale frame rate on FPGA board," Microprocessors and Microsystems (MICPRO), Vol. 36, Issue 4, pp.315-323, 2012.
[21] Gokhan Koray Gultekin, and Afsar Saranli, "An FPGA based high performance optical flow hardware design for computer vision applications," Microprocessors and Microsystems (MICPRO), Vol. 37, Issue 3, pp.270-286, 2013.
[22] M. Genovese, and E. Napoli, "FPGA-based architecture for real time segmentation and denoising of HD video," Journal of Real-Time Image Processing, Vol. 8, Issue 4, pp. 389-401, 2013.
123
[23] Qingyi Gu, Takeshi Takaki, and Idaku Ishii, "Fast FPGA-Based Multiobject Feature Extraction," IEEE Trans. on Circuits and Systems for Video Technology., Vol. 23, no. 1, pp. 30-45, 2013.
[24] Zhilei Chai, Xinglong Shao, Yuanpu Zhang, Wenmin Yang, and Qin Wu,
"Accelerating image boundary detection by hardware parallelism," Microelectronic Engineering, Vol. 38, pp.458-469, 2014.
[25] Shweta Jain-Mendon and Ron Sass, "A hardware–software co-design approach for implementing sparse matrix vector multiplication on FPGAs," Microelectronic Engineering, Vol. 38, pp.873-888, 2014.
[26] Jianhui Wang, Sheng Zhong, Luxin Yan, and Zhiguo Cao, "An Embedded System-on-Chip Architecture for Real-time Visual Detection and Matching," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 24, Issue 3, pp.
525-538, 2014.
[27] R. Kadota, H. Sugano, M. Hiromoto, H. Ochi, R. Miyamoto, and Y. Nakamura,
"Hardware Architecture for HOG Feature Extraction," Proc. IIH-MSP. IEEE, 2009, pp. 1330-1333.
[28] P. Y. Chen, C. C. Huang, C. Y. Lien, and Y. H. Tsai, "An Efficient Hardware Implementation of HOG Feature Extraction for Human Detection," IEEE Transactions on Intelligent Transportation Systems, Vol. 15, issue 2, pp. 656-662, April 2014.
[29] M. Hemmati, M. Biglari-Abhari, S. Berber, and S. Niar, "HOG Feature Extractor Hardware Accelerator for Real-Time Pedestrian Detection," Euromicro Conference on Digital System Design (DSD), Aug 2014, pp. 543-550.
[30] M. Hatto, T. Miyajima, and H. Amano, "Data Reduction and Parallelization for Human Detection System," Workshop on Synthesis And System Integration of Mixed Information Technologies, March 2015, pp. 134-139.
[31] S. Bauer, U. Brunsmann, and S. Schlotterbeck-Macht, "FPGA implementation of a HOG-based pedestrian recognition system," In MPC Workshop, 2009, pp. 49-58.
[32] S. Bauer, S. Kohler, K. Doll, and U. Brunsmann, "FPGA-GPU Architecture for Kernel SVM Pedestrian Detection," IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 2010, pp. 61–68.
[33] K. Mizuno, Y. Terachi, K. Takagi, S. Izumi, H. Kawaguchi, and M. Yoshimoto,
"Architectural Study of HOG Feature Extraction Processor for Real-Time Object Detection," IEEE Workshop on Signal Processing Systems (SiPS), Oct 2012, pp.
124
197-202.
[34] K. Takagi, K. Mizuno, S. Izumi, H. Kawaguchi, and M. Yoshimoto, "A sub-100-milliwatt dual-core HOG accelerator VLSI for real-time multiple object detection," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2013, pp. 2533-2537.
[35] A. Suleiman and V. Sze, "Energy-efficient HOG-based object detection at 1080HD 60 fps with multi-scale support," IEEE Workshop on Signal Processing Systems (SiPS), Oct 2014, pp. 1-6.
[36] 毛詮毅,「語意驅動式 HOG 行人偵測」,國立高雄大學電機工程系,碩士論文,
中華民國台灣,2011。
[37] C. Huang, H. Ai, B. Wu, and S. Lao, "Boosting Nested Cascade Detector for Multi-View Face Detection," Proceedings of the 17th International Conference on Pattern Recognition (ICPR 2004), Vol. 2, 2004, pp. 415-418.
[38] P. Viola, M. Jones, and D. Snow, "Detecting pedestrians using patterns of motion and appearance," International Journal of Computer Vision (IJCV), Vol. 63, no. 2, pp. 153–161, July 2005.
[39] Q. Zhu, S. Avidan, M. C. Yeh, and K. T. Cheng, "Fast Human Detection Using a Cascade of Histograms of Oriented Gradients," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 1491–1498, 2006.
[40] 顏履安,「多重全域樣版匹配引導 AdaBoost 演算法學習之人形偵測系統」,國 立高雄大學電機工程系,碩士論文,中華民國台灣,2014。
[41] Shih-Yu Lin and Pei-Yung Hsiao, "An Embedded Real-Time Human Detector Integrated with HW/SW Co-Design," The 26th VLSI Design/CAD Symposium, Hualien, Taiwan, August. 4-7, 2015, pp. 1-2.
[42] CBCL PEDESTRIAN DATABASE, http://cbcl.mit.edu/software-datasets [43] CVC Virtual Pedestrian Dataset, http://www.cvc.uab.es/adas/databases [44] INRIA Person Dataset, http://pascal.inrialpes.fr/data/human
[45] EC Funded CAVIAR project/IST 2001 37540, found at http://homepages.inf.ed.ac.uk/rbf/CAVIAR
[46] i-LIDS datasets for 2007 Advanced Video and Signal based Surveillance:AVSS, http://www.eecs.qmul.ac.uk/~andrea/avss2007_d.html
[47] D. Anguita, A. Boni, and S. Ridella, "A Digital Architecture for Support Vector
125
Machines: Theory, Algorithm, and FPGA Implementation," IEEE Trans. Neural Networks, Vol. 14, no. 5, pp. 993-1009, Sept. 2003.
[48] C. C. Chang, and C. J. Lin, "LIBSVM : a library for support vector machines,"
ACM Transactions on Intelligent Systems and Technology, Vol.2, issue.3, pp.27:1-27:27, 2011. Software available at
http://www.csie.ntu.edu.tw/~cjlin/libsvm
[49] R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin, "LIBLINEAR: A Library for Large Linear Classification," Journal of Machine Learning Research, Vol. 9, pp.1871-1874, 2008. Software available at
http://www.csie.ntu.edu.tw/~cjlin/liblinear
[50] F. M. Alzahrani and T. Chen , "A Real-Time Edge Detector: Algorithm and VLSI Architecture," Real-Time Imaging, Vol. 3, no. 5, pp. 363-378, October 1997.
[51] G. Borgefors, "Distance Transformations in Digital Images," Computer Vision, Graphics, and Image Processing, Vol. 34, no. 3, pp. 344-371, June 1986.
[52] D.M. Gavrila and V. Philomin, "Real-time object detection for “smart” vehicles,"
IEEE Conference on Computer Vision, Vol. 1, pp. 87-93, 1999.
[53] D.M. Gavrila, "Pedestrian Detection from a Moving Vehicle," in Proc. Eur. Conf.
Comput. Vis., pp. 37-49, 2000.
[54] Z. Lin and L.S. Davis, "Shape-Based Human Detection and Segmentation via Hierarchical Part-Template Matching," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, no. 4, pp. 604-618, April 2010.
[55] Shih-Shinh Huang, Chien-Yi Mao, Yao-Ming Yu, and Pei-Yung Hsiao, "Global Template Matching for Guiding the Learning of Human Detector," IEEE International Conf. on Systems, Man and Cybernetics, Seoul, pp. 565-570, 2012.
[56] Intel Open Source Computer Vision Library, http://www.intel.com/technology/computing/opencv
[57] George Wolberg. Digital Image Warping. IEEE Computer Society Press, Los Alamitos, CA, USA, 1990.
[58] HSL(Hue-Saturation-Lightness) Color Space, https://en.wikipedia.org/wiki/HSL_and_HSV
[59] 史碩禮,「非接觸式指紋取像裝置與辨識系統」,國立臺灣大學電機電信電子 產業研發系,碩士論文,中華民國台灣,2007。
[60] Shih-Yu Lin and Pei-Yung Hsiao, "One-HOG Accelerating for FPGA
126
Implementation of Human Detection," The 3rd International Symposium on Next-Generation Electronics, Tao-Yuan, Taiwan, May. 7-10, 2014, pp.1-2.
[61] Pei-Yung Hsiao, Shih-Yu Lin, and Shih-Shinh Huang, "An FPGA based human detection system with embedded platform," Microelectronic Engineering, Vol.138, pp. 42-46, April 2015.
127
附錄一 重要名詞解釋
1-1 機器學習(Machine Learning)
機器學習主要分為三種:監督式學習(Supervised Learning)、非監督式學習 (Unsupervised Learning)、增強式學習(Reinforcement Learning)。其中增強式學習與 本論文無關係,在此不討論。本論文中的方法是採用其中監督式學習,監督式學 習與非監督式學習的差別為監督式學習在訓練階段(Training Stage)的樣本需要使 用正樣本(Positive Samples)與負樣本(Negative Samples),也就是人為標註的結果 (Ground Truth),使系統訓練學習得到一個函數,當有新的數據時即可根據該函數 預測結果。
2-1 人形偵測(Human Detection)
人形偵測是物件偵測(Object Detection)的一種,顧名思義是要偵測人形,常用 於監控系統(Surveillance System)或者電腦視覺(Computer Vision)等。在本論文是以 數位影像處理(Digital Image Processing)的方式來進行,其他進行人形偵測的方法也 有例如超音波、紅外線熱感應等。雖然以人的肉眼要進行人形偵測或者任何物件 偵測是很簡單且理所當然的,但是要以電腦或者機器人來進行人形偵測就有相當 的難度了。現階段的電腦即使運算能力再強,仍是需要程式才能運作,所以如何 設計演算法與進行程式撰寫使得電腦可以進行人形偵測是一項挑戰。
2-2 人形偵測器(Human Detector)
為了能進行一張 Frame 上的人形偵測。除了人形分類器偵測階段之外尚需要 結合使用許多重要的演算法模組。包含:偵測視窗掃描,讀取事先建立的參數檔 案掃描所有偵測視窗候選的座標位置;人形偵測器前處理,包含灰階化與背景切 割;偵測視窗切割,根據背景切割的結果,切割出需要進行分類的偵測視窗候選;
128
偵測視窗縮放,將切割出的偵測視窗候選縮放至 64x128,輸入進人形分類器進行 分類;人形偵測器後處理,依據不同的方法使用非極大值抑制(NMS)與平均偏移 (MS)。這些演算法模組原理於本論文第四章有進行詳細的介紹,由這些模組組成 的系統即稱為人形偵測器。人形偵測器運作的基本單位為一張 Frame,而影片是由 連續的 Frame 所組成,所以若應用於影片,需要連續以人形偵測器對所有 Frame 進行偵測。
2-3 人形偵測系統(Human Detection System)
人形偵測系統即為包含人形偵測器運作所需要的所有軟體與硬體裝置,軟體 即利用 C 語言撰寫完成的整合 FPGA 硬體加速器之人形偵測器,硬體裝置的部分 除了嵌入式平台上的 FPGA 之外,整套 MaCube 嵌入式平台、USB 攝影機、螢幕,
以及運作所需的鍵盤滑鼠等都算在內。以上能運作人形偵測器的整套裝置即稱為 人形偵測系統。而另外 PC 版的人形偵測系統則包含了利用 C 語言撰寫完成的人形
以及運作所需的鍵盤滑鼠等都算在內。以上能運作人形偵測器的整套裝置即稱為 人形偵測系統。而另外 PC 版的人形偵測系統則包含了利用 C 語言撰寫完成的人形