Chapter 3 Hardware Design of Edge-Adaptive Real-Time Image Scaling
3.6 Output and Timing Generator Circuit
After finishing interpolate an image and storing into output frame buffer, output signal and timing generation circuit begins reading output frame buffer, then generate timing reference signal which is required by LCD monitor. Fig. 3-22 shows
architecture of output signal and timing generation circuit.
Fig. 3-23. Architecture of output signal and timing generation circuit.
Chapter 4
Experimental Results
4.1 Simulation of HVS-Based Edge-Adaptive Image Scaling
We will demonstrate the experiment result of HVS-based edge-adaptive image scaling, all input images are in the resolution of 160 x 120 and scaled by factor of 2, so resolution of output images are 320 x 240. In Fig. 4-1 ~ Fig.4-4, each of them contains five images (a) ~ (e), (a) is original image, (b) is image interpolated by nearest neighborhood method, (c) is image interpolated by bicubic method, (d) is image interpolated by HVS-based Edge-adaptive interpolation in MATLAB, and (e) is the image interpolated by post simulation of synthesized verilog code.
Since PSNR is not one hundred percent stand for quality of image interpolation algorithm, so we can not just observe PSNS comparison table and decide which image interpolation method is best. In addition we also have to check the result images by human eyes. AS shown in Fig. 4-1 ~ Fig.4-4, the results from HVS-based edge-adaptive interpolation outperform traditional interpolations in edge parts, (d) and (e) has less jag and blur than (b) and (c). And table 4-1 is their PSNR comparison.
(a)
(b) (c)
(d) (e)
Fig. 4-1. Portions of (a) original house image, (b) scaled image by nearest neighborhood interpolation, (c) scaled image by bilinear interpolation, (d) scaled image by HVS-based edge-adaptive interpolation
(MATLAB), (e) scaled image by dge-adaptive interpolation (FPGA post simulation)
(a)
(b) (c)
(d) (e)
Fig. 4-2. Portions of (a) original house image, (b) scaled image by nearest neighborhood interpolation, (c) scaled image by bicubic interpolation, (d) scaled image by HVS-based edge-adaptive interpolation
(MATLAB), (e) scaled image by edge-adaptive interpolation (FPGA post simulation)
(a)
(b) (c)
(d) (e)
Fig. 4-3. Portions of (a) original fighter image, (b) scaled image by nearest neighborhood interpolation, (c) scaled image by bicubic interpolation, (d) scaled image by HVS-based edge-adaptive interpolation
(MATLAB), (e) scaled image by edge-adaptive interpolation (FPGA post simulation)
(a)
(b) (c)
(d) (e)
Fig. 4-4. Portions of (a) original BW image, (b) scaled image by nearest neighborhood interpolation, (c) scaled image by bicubic interpolation, (d) scaled image by HVS-based edge-adaptive interpolation
(MATLAB), (e) scaled image by edge-adaptive interpolation (FPGA post simulation)
Table 4-1 shows the PSNR comparison between each image interpolation method. As shown in table 4-1, HVS-based edge-adaptive interpolation has better PSNR than traditional interpolation techniques, and our approximated algorithm also has good performance.
Table 4-1. PSNR comparison between each image interpolation method.
NN Bilinear Bicubic Proposed
(software)
Proposed (hardware)
House 21.963 23.891 24.023 24.287 24.114
Car Light 26.031 28.234 28.502 28.804 28.648
Fighter 22.493 26.691 26.844 27.742 27.552
BW 15.908 18.472 19.1 20.324 20.192
4.2 FPGA Implementation
Our real-time HVS-based edge-adaptive video scaling hardware is based on DE2-70 FPGA development board, using composite connector on the board to connect image device to development board, and ADV7180 video decoder decode NTSC signal into ITU-R.656 digital video data stream, then sent into FPGA chip to do computation. The FPGA chip on the development board is Cyclone II EP2C70 with 68416 logic elements and 1.1 Mbits embedded memory. We synthesis our Verilog HDL code with Quartus II and program the FPGA chip. After all, a LCD monitor is connected to FGPA development board through GPIO interface; we can now observe the result by directly watching this monitor. Fig. 4-5 is the photo of our hardware demonstration.
Fig. 4-5. Demonstration platform.
4.2.1 ITR-R.656 Signal Acquisition Unit
After ITU-R.656 signal acquire unit is synthesized by Quartus II, it uses 317 logic elements and working frequency of clock is up to 95.73 MHz. Fig. 4-6 shows summary of synthesis and Fig. 4-7 shows timing report.
Fig. 4-6. Synthesis summary of ITU-R.656 signal acquire unit.
Fig. 4-7. Timing report of ITU-R.656 signal acquire unit.
4.2.2 Data Flow Control Unit
After data flow control unit is synthesized by Quartus II, it uses 2305 logic elements and working frequency of clock is up to 146.2 MHz. Fig. 4-8 shows summary of synthesis and Fig. 4-9 shows timing report.
Fig. 4-8. Synthesis summary of data flow control unit.
4.2.3 Image Scaling Unit
After image scaling unit is synthesized by Quartus II, it uses 13949 logic elements, Fig. 4-10 shows summary of synthesis. It takes 69.92ns for this circuit to calculate the correct answer, Fig 4-11 is the waveform of post simulation, three outputs are 71, 53 and 88. The average computational time of one interpolated pixels is 23.3 ns.
Fig. 4-10. Synthesis summary of image interpolation unit.
Fig. 4-11. Post-simulation waveform of image interpolation unit.
4.2.4 Output Timing and Data Address Generator Unit
After output timing and data address generation unit is synthesized by Quartus II, it uses 727 logic elements and working frequency of clock is up to 153 MHz. Fig.
4-12 shows summary of synthesis and Fig. 4-13 shows timing report.
Fig. 4-12. Synthesis summary of output timing and data address generation unit.
Fig. 4-13. Timing report of output timing and data address generation unit.
4.2.5 Integration and Synthesis
After all units are integrated, we use Quartus II to synthesis the whole hardware, and it uses 19913 logic elements, the usage of total logic elements in FPGA is 29%.
Besides, we also use 775760 bits embedded memory, 155848 bits are used by input frame buffer and 618888bit are used by output frame buffer. Fig. 4-14 shows full compilation summary of full system.
Fig. 4-14. Full compilation summary of full system.
4.2.6 Performance Estimation
We estimate out hardware performance in situation of: size of input image is 160 x 120, scaling factor is 2, size of output image is 320 x 240, system clock is 95.73 MHz. Table 4-2 shows system performance and FPS is calculated by (4.1), after calculation, our circuit can process 299 frames per second. When output image is in the NTSC standard resolution 720 x 480, our hardware performance is above 66 FPS.
Table 4-2. System performance while clock frequency is 95.73 MHz.
cycles times All cycles Total time
wait_for_start 1 1 1 10.446ns
load_mem_36 37 115 4255 44448ns
load_mem_6 7 17825 124775 1303405ns
data_out 3 17980 53940 563460ns
check_finish 1 17980 17980 187820ns
compute 17980 1245834ns
Total 3344977ns
1
s
If we lower the clock frequency to 27 MHz, the performance is shown in Table 4-3 and (4.2), FPS is above 115.
Table 4-3. System performance while clock frequency is 27 MHz.
cycles times All cycles Total time
wait_for_start 1 1 1 37.037ns
load_mem_36 37 115 4255 157593ns
load_mem_6 7 17825 124775 4621296ns
data_out 3 17980 53940 1997778ns
check_finish 1 17980 17980 665926ns
compute 17980 1245834ns
Total 8688464ns
095 . 10 115
* 8688464
1
9 =
ns
−s
(4.2)4.2.7 Power Consumption
Power consumption of our hardware is 658.78mW while clock frequency is 27 MHz, where embedded memory cost 112.58mW.
Chapter 5
Conclusions and Future Works
In this thesis, the edge-adaptive interpolation for digital image resizing has been implemented on FPGA. This proposed design uses the fuzzy decision module to automatically identify the characteristic of input image and to decide which interpolation module will be used. If input image is not sensitive to human eyes, in order to reduce computational power, the bilinear interpolation module will be chosen.
Otherwise, edge-adaptive image interpolation module will be selected to reduce blurry and jagged defects in edge section. CORDIC circuit is used to replace arc tangent function which is used to calculate the orientations of input image. This method can effectively save hardware resources compared with a large look-up table.
In this thesis, the proposed FPGA design can achieve real-time video scaling processing in resolution of 320×240 (CIF image). While system frequency is at 95 MHz, the performance is above 300 frames per second. Compared with traditional image interpolation methods, the proposed FPGA design has better visual quality due to the reducing of blurry and jagged defects in edge section.
The method of image scaling in this thesis is only one of many image scaling applications, other applications such as enhancement of low resolution signal source to fit different resolutions of LCD panels, or digital zooming of digital video capturing devices. The proposed image scaling circuit can be used in many image scaling applications.
1. Different scaling factor
The image scaling hardware may have adjustable and bigger scaling factor to output images in different sizes. Maybe we can use re-scaling loops to achieve different scaling factor.
2. Tunable scaling ratio between length and width
A tunable scaling ratio between length and width of output image is design to fit different LCD displayers. Maybe it can be achieved by neglecting some input pixels.
3. I2C interface
Use I2C interface to change the parameters in our hardware to achieve adaptive hardware functions.
4. Pipeline design
Involve pipeline design in our hardware to accelerate the processing speed. While designing pipeline, it is better to make all stages have same computational time. Firstly we have to measure computational time of each part in scaling circuit, and then figure out how to make all stages have same computational time.
5. Color display
Since human eyes are not sensitive to Cb and Cr, we can use bilinear interpolation to scale Cb and Cr, and then we can use YCbCr to RGB coveter to achieve color display.
References
[1] Chin-Teng Lin; Kang-Wei Fan; Her-Chang Pu; Shih-Mao Lu; Sheng-Fu Liang,
"An HVS-Directed Neural-Network-Based Image Resolution Enhancement Scheme for Image Resizing," Fuzzy Systems, IEEE Transactions on , vol.15, no.4, pp.605-615, Aug. 2007.
[2] Xue K.; Winans, A.; Walowit, E. “An edge-restricted spatial interpolation algorithm,” Image Processing, 1992. ICIP 92. Proceedings. 1992 International
Conference on , vol., no., pp.153-161 vol.2, 3-7 Oct 1992.
[3] Unser, M.; Aldroubi, A.; Eden, M., "Enlargement or reduction of digital images with minimum loss of information," Image Processing, IEEE Transactions on , vol.4, no.3, pp.247-258, Mar 1995.
[4] Jensen, K.; Anastassiou, D., "Subpixel edge localization and the interpolation of still images ," Image Processing, IEEE Transactions on , vol.4, no.3, pp.285-295, Mar 1995.
[5] H. C. Ting; H. M. Hang, “Spatially adaptive interpolation of digital images using fuzzy inference,” Image Processing, 1996. Proceedings., International
Conference on , vol.2, no., pp.1206-1217 vol.2, 15-18 Sep 1996.
[6] Lee S. W.; Paik J. K., “Image interpolation using adaptive fast b-spline filtering,”
Acoust., Speech, Signal Process, 1993. Proceedings., International Conference on 1993, vol. 5, pp. 177–180 vol.5, 8-13 Mar 1993.
[7] Adams, J. E. “Interactions between color plane interpolation and other image processing functions in electronic photography,” SPIE 1995. Proceedings.,
International Conference on 1995, vol. 2416, pp. 144–151 vol.2416, 75-80 Oct
1995.[8] Morse, B.S.; Schwartzwald, D., "Isophote-based interpolation," Image
Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on , vol.,
no., pp.227-231 vol.3, 4-7 Oct 1998.[9] Ratakonda, K.; Ahuja, N., "POCS based adaptive image magnification," Image
Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on , vol.,
no., pp.203-207 vol.3, 4-7 Oct 1998.[10] Call D.;Montantanvert A., “Superresolution inducing of an image,” Image
Processing, 1998. Proceedings., International Conference on , vol.5, no.,
pp.232-235 vol.2, 15-18 Mar 1998.[11] Allebach, J.; Ping Wah Wong, "Edge-directed interpolation," Image Processing,
[12] Carrato, S.; Ramponi, G.; Marsi, S., "A simple edge-sensitive image interpolation filter," Image Processing, 1996. Proceedings., International Conference on , vol.3, no., pp.711-714 vol.3, 16-19 Sep 1996.
[13] Xin Li; Orchard, M.T., "New edge-directed interpolation," Image Processing,
IEEE Transactions on , vol.10, no.10, pp.1521-1527, Oct 2001.
[14] Muresan, D.D.; Parks, T.W., "Adaptive, optimal-recovery image interpolation,"
Acoustics, Speech, and Signal Processing, 2001. Proceedings. (ICASSP '01).
2001 IEEE International Conference on , vol.3, no., pp.1949-1952 vol.3, 2001.
[15] Muresan, D.D.; Parks, T.W., "Adaptively quadratic (AQua) image interpolation,"
Image Processing, IEEE Transactions on , vol.13, no.5, pp. 690-698, May 2004.
[16] Candocia, F.M.; Principe, J.C., "Super-resolution of images based on local correlations," Neural Networks, IEEE Transactions on , vol.10, no.2, pp.372-380, Mar 1999.
[17] Chin-Teng Lin; Lee C. S. G., Neural Fuzzy Systems: A Neuro-Fuzzy Synergism to Intelligent Systems. Englewood Cliffs, NJ: Prentice-Hall, 1996.
[18] Chin-Teng Lin; Yin-Cheung Lee; Her-Chang Pu, "Satellite sensor image classification using cascaded architecture of neural fuzzy network," Geoscience
and Remote Sensing, IEEE Transactions on , vol.38, no.2, pp.1033-1043, Mar
2000.[19] Chun-Hsien Chou; Yun-Chin Li, "A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile," Circuits and Systems
for Video Technology, IEEE Transactions on , vol.5, no.6, pp.467-476, Dec 1995.
[20] Raghupathy, A.; Chandrachoodan, N.; Liu, K.J.R., "Algorithm and VLSI architecture for high performance adaptive video scaling," Multimedia, IEEE
Transactions on , vol.5, no.4, pp. 489-502, Dec. 2003.
[21] Jianping Xiao; Xuecheng Zou; Zhenglin Liu; Xu Guo, "Adaptive Interpolation Algorithm for Real-time Image Resizing," Innovative Computing, Information
and Control, 2006. ICICIC '06. First International Conference on , vol.2, no., pp.
221-224, 30-01 Aug. 2006.
[22] Ramachandran, S.; Srinivasan, S., "Design and FPGA implementation of a video scalar with on-chip reduced memory utilization," Digital System Design, 2003.
Proceedings. Euromicro Symposium on , vol., no., pp. 206-213, 1-6 Sept. 2003.
[23] Uyar, Baris; Sayinta, Murat; Akgun, Toygar; Orencik, Bulent; Altunbasak, Yucel,
"Spatial Feature Based Video Scaling Scheme and its FPGA Implementation for Video Standards Conversion," Signal Processing Systems, 2007 IEEE Workshop
on , vol., no., pp.267-272, 17-19 Oct. 2007.
[24] Hudson, R.D.; Lehn, D.I.; Athanas, P.M., "A run-time reconfigurable engine for image interpolation," FPGAs for Custom Computing Machines, 1998.
Proceedings. IEEE Symposium on , vol., no., pp.88-95, 15-17 Apr 1998.
[25] Nuno-Maganda, M.A.; Arias-Estrada, M.O., "Real-time FPGA-based architecture for bicubic interpolation: an application for digital image scaling,"
Reconfigurable Computing and FPGAs, 2005. ReConFig 2005. International Conference on , vol., no., 8 pp.-, 28-30 Sept. 2005.
[26] Amanatiadis, A.A. A.; Andreadis, I.I.; Konstantinidis, K.K., "Design and Implementation of a Fuzzy Area-Based Image-Scaling Technique,"
Instrumentation and Measurement, IEEE Transactions on , vol., no.,
pp1-1, 2003.[27] Analog devices, ADV7180 SDTV video decoder, datasheet, 2007. Website is available at www.analog.com/static/imported-files/data_sheets/ADV7180.pdf.
[28] Toppoly, RGB Driver/Timing controller IC For LTPS TFT LCD, datasheet.
Appendix
A.1 Image Signal Formats
A.1.1 Input Signal Format
The input signal of our hardware is standard ITU-R.656 digital video data stream generated by ADV7180 Video decoder. In ITU-R.656 standard, images are displayed in interlace method. Fig. A-1 shows one example of an interlaced image, a complete image is the combination of odd lines field and even lines field, odd field contains the image information of odd lines, even field contains the image information of even lines. Odd field will displayed on monitor at first, then even field will be displayed on monitor right after, so it is called “interlaced display”.
Fig. A-1. Example of an interlaced image.
Fig. A-2 is horizontal scan line waveform of ITU-R.656 digital video data stream, table A-1 shows timing parameters of horizontal scan lines in Fig. A-2, and Fig. A-3 shows waveform of vertical frames.
Fig. A-2.Horizontal scan lines waveform of ITU-R.656 digital video data stream.
Table A-0-1. Timing parameters of horizontal scan lines in Fig. 4-5.
Fig. A-3.Vertical frame waveform of ITU-R.656 digital video data stream.
There are four parts in horizontal scan lines, SAV (Start of Active Video), EAV (End of Active Video), H BLANK and active video. Active contains three types of valid video data, Y (luminance), Cb (blue chroma) and Cr (red chroma), and they appear in active video data in the order of (A.1).
(A.1)
Where Y is luminance and it is the main element of image contours, Cb and Cr are elements that affect color behavior.
SAV and EAV are very important for identifying the position of images in video stream. SAV and EAV are embedded in ITU-R.656 data stream and they are located in the beginning and ending of active video with length of 32 bits, SAV and EAV can be expressed as FF 00 00 XY in hexadecimal. XY is called timing reference of ITU-R.656, it will change if the position of scan line changes. X is composed of four bits and can be expressed as X=1, F, V, H. Frame partition of ITU-R.656 signal is shown in Fig. A-4, and table A-2 is different timing reference corresponding to Fig.
A-4.
Fig. A-4. Frame partition of ITU-R.656 digital signal.
Table A-0-2. Different timing reference corresponding to Fig. A-3.
A.1.2 Output Signal Format
Output of our video scaling hardware uses GPIO interface to transfer data to TPG015 TFT LCD Driver IC on LCD module, waveform of horizontal scan line in shown in Fig. A-5, and its timing parameters are listed at table A-3.
Fig. A-5. Waveform of output horizontal scan line. [28]
Table A-0-3. Timing parameters of Fig. A-5. [28]
Due to LCD monitor use progressive display instead of interlaced display, so there is no difference between odd field and even field, every single field is a complete frame. Fig. A-6 shows vertical frame waveform of LCD output signal, its timing parameters are shown in table A-4.
Fig. A-6. Vertical frame waveform of LCD output signal. [28]
Table A-0-4. Timing parameters of Fig. A-6. [28]
A.1.3 YCbCr to RGB Converter
YCbCr can be transform to RGB format by follow equations.
口試委員意見
A: 這為此演算法先天之限制,由於本演算法需要使用 overlapping 之BLOCK,BLOCK BASED 的方法可能沒有辦法實現於本演算 法中。
A: 因為本電路之訊號來源為 interlaced 訊號源,須作 de-interlace 的
動作,所以必須使用FRAME BUFFER 是比較適合的。如果訊號 來源為progressive 訊號源,則可使用 LINE BUFFER。
柯立偉老師 1. 權重近似之判別依據?
A: 如論文 Page. 47 所示。在軟體模擬時,權重皆為精確度至 0.000000001 之數,但是在電路中如果使用相同精準度之權重會 造成運算時間過長以及運算電路過大之問題,因此我們勢必要對 於權重作出近似之動作。下圖(a)~(d)為實驗之結果,圖(a)為使用 來比較之原圖,而圖(b)則是由權重精準度至 1/128 所補插出來之 影像與未經過權重近似所補插出來之影像作差值,而圖(c)、圖(d) 使用之權重精準度分別為1/256 及 1/512。當權重精準度至 1/128 時,與使用原始權重作補插之影像作差值,單點差值最大值為 12,而大部分出現之差值大約為 3~6 之間。當權重精準度至 1/256 時,單點差值最大值為 5,而大部分出現之差值大約為 1~2.5 之 間。如當當權重精準度至 1/512 時,單點差值最大值為 4,而大 部分出現之差值大約為1~2 之間。我們能夠發現當權重精準度由 1/128 提升為 1/256 時,差值減少了許多,最大差值由 12 降為 5。
但是權重精準度再由1/256 提升為 1/512 時,改善的幅度卻有限,
最大差值只由5 降為 4。故我們決定將權重精準度設為 1/256,此 時與原圖相差之最大值與整個色階之比例為 5/255,不到百分之 二,故對人眼並不明顯。
2. 硬體和軟體處理之速度比較?
A: 以 MATLAB 於 PC 上每處理一張之影像需要 6.3 秒,而本硬體 在93Mhz 時處理相同影像需要 1/300 秒。如以此情況相比,硬體
鍾仁峰老師 1. 一般 DECODER 之 CLOCK 為 27MHz,和此硬體使用之頻率相 同,這樣是否來的及處理影像資料?
A: 經過計算,在目前之影像處理方式是來的及的,如要採用大尺 寸的輸出,則必須使用兩個輸入暫存記憶體。此需視硬體效能而 定,若硬體效能為60FPS 且擷取之影像高度大於 240 像素,則需 使用兩個輸入暫存記憶體來切換。
2. 如由灰階改為彩色運算,運算量是否增加很多?
A: 由於人眼對於 U 以及 V 值並沒有這麼敏感,所以可以只採用 BILINEAR 對其運算即可,運算量上不會增加太多,不過記憶體 可能會需要兩倍的容量。
3. 將來可考慮使用 I2C 介面修改硬體參數。
A: 會列入將來設計的考量。 論文 Page. 72
陳慶瀚老師 1. 如只使用兩倍放大,可能有些會出現效果的地方會看不出來,將 來可考慮使用更大放大倍率。
A: 這為我們主要的 FUTURE WORKS 之一。 論文 Page. 71 2. 如將輸出解析度提升為 720*480,可能 FPS 會不到 10,此硬體
效能不算高之原因為何?
A: 如以本論文之設計,輸出影像解析度為 320x240,在時脈為
A: 如以本論文之設計,輸出影像解析度為 320x240,在時脈為