Chapter 1 Introduction
1.2 Thesis Organization
This Thesis is organized in as follows. Chapter 2 introduces the principal of image scaling, the defects of traditional image interpolation schemes and HVS-based edge-adaptive image scaling which is used in this thesis. Chapter 3 introduces our hardware of real-time video scaling and video spec. Chapter 4 is the experimental results of our HVS-based edge-adaptive image scaling and practical hardware
Chapter 2
Principle of Image Scaling and related research
2.1 Basic Principle of Image Scaling
Image scaling is also called image resizing and can be divided into scaling up and scaling down. In this thesis, we only discuss the situation of only scaling up, and scaling down situation is neglected. In fact, the principle of image scaling is sampling theorem of digital signal processing. It is mentioned that if we have samples of signal and the periodic of sampling, then we can recover original signal information, but aliasing is often generated while sampling the signal and this makes distortion to reconstructed signal. To avoid this situation, the sampling rate must be twice faster than signal rate when sampling signal, it is called nyquist sampling theorem. Fig. 2-1 is the block diagram of signal reconstruction and re-sampling, and it can explain the process of image resizing.
) ( j ω H
r)
(n
f Ideal reconstruct f
r(t )
filter Sampling M f
m(n )
Fig. 2-1. Block diagram of signal reconstruction.
The
f
(n) in Fig. 2-1 is a discrete sample of image signal, afterf
(n) passing ideal low pass filter, the reconstructed output signal from low pass filter is (2.1).∑
∞ reconstruction filter withπ/T cut off frequency, and it can be expressed as (2.2).T
Figure 2-2 shows the ideal reconstruction filter
h
γ(t).Fig. 2-2. Ideal reconstruction filter.
So the reconstructed continues signal can be expressed as (2.3).
∑
∞reconstruction is the convolution of two functions, and the key point to the quality of reconstructed signal is the difference between different reconstruction functions. It is known that ideal reconstruction filter is sinc function, it a function with non zero value when time closing to infinite but not realizable in real application, so sinc function has to be replaced by other reconstruction functions. The quality of reconstructed signal will be different if we use different function to replace sinc function. To approximate perfect low pass filter and have least distortion, many interpolation schemes use piecewise function to replace sinc function. Piecewise function uses different functions in different temporal sections. Sinc function is symmetrical when time is zero, so the functions which are used to replace sinc function usually have same characteristic.
The goal of re-sampling in Fig.2-1 is to make continues signal
f
γ(t) back to discrete digital signal and can be expressed as (2.4),) ( )
(
M
f T n
f
m = γ , (2.4)where M is to coefficient of sampling, if M > 1 means that it samples more data and that will enlarge output image, if M = 1 means that the size of output image remains unchanged, if 0 < M < 1 means the size of output image becomes smaller.
Figure. 2-3 shows an example of image scaling up action. In the left is the original image and output image is the image in the right side.
Fig. 2-3. Example of image scaling.
We will introduce several traditional interpolation functions and discus their advantages and disadvantages in next section.
2.2 Several Traditional Interpolation Schemes
2.2.1 Nearest Neighborhood
Nearest neighborhood interpolation scheme is the simplest and most efficient interpolation function, it has very low computational complexity but has poor image quality. During it process Image scaling, it usually generates jagged and blocking. Its mathematical function can be expressed as (2.5) and the waveform of time domain is shown in Fig. 2-4.
⎪⎧ −1 <
t
< 1 ,1
Scaling Up
Fig. 2-4. Time domain waveform of nearest neighborhood interpolation function.
The algorithm of this interpolation scheme is to identify which original pixel is closest to the new pixel, and use it as a new pixel, so the interpolated image will be very sharp. Fig. 2-5 shows the algorithm of nearest neighborhood interpolation scheme.
Fig. 2-5. Algorithm of nearest neighborhood interpolation scheme.
In Fig. 2-5, P is the new pixel that needed to be interpolated, and S1, S2, S3 and S4 are original pixels. If S4 is closest to P, then we use S4 as P in interpolated image.
2.2.2 Bilinear
Bilinear interpolation is popular in many applications, although it is more complex than nearest neighborhood interpolation, but the quality of its result is much better than the result of NN interpolation. Its computation time is longer than NN interpolation due to the interpolated pixel needed to be calculated. The main disadvantage of bilinear interpolation is that it usually generate blur and jag in the edge section. The function of bilinear interpolation is as (2.6) and the waveform of bilinear interpolation in time domain is shown in Fig. 2-6.
⎩⎨
Fig. 2-6. Time domain waveform of bilinear interpolation function.
Bilinear interpolation works in 2-dimension space, and 1 interpolated pixel is decided by 4 sampled pixels, the algorithm of bilinear interpolation scheme is shown
Fig. 2-7. Algorithm of bilinear interpolation scheme.
As we can see in Fig. 2-7, P is the new pixel needed to be interpolated, and S1, S2, S3 and S4 are original pixels. If an original pixel is closer to P, and the weight of this original pixel to P will be higher, otherwise it will be smaller. In Fig. 2-7, dx and dy are the distances from P to S1, S2, S3 and S4, x and y are the distances between original pixels. Bilinear interpolation scheme can be expressed as (2.7).
)
Bicubic interpolation is somewhat like bilinear interpolation, and the interpolation core of bicubic interpolation is more similar to sinc function than bilinear interpolation is. Bicubic interpolation uses near-by 16 pixels to compute interpolated pixels, so it is not as efficiency as bilinear interpolation. The result of bicubic interpolation is better than previous two interpolation schemes, but the
disadvantages of bicubic interpolation are that there is still blur in edge part and large amounts of computation. Mathematical function of bicubic interpolation is shown as (2.8), and the waveform of bicubic interpolation in time domain is shown in Fig. 2-8.
⎪⎩
Fig. 2-8. Time domain waveform of bicubic interpolation function.
Fig. 2-9 shows the algorithm of bicubic interpolation scheme, where P is the new pixel needed to be interpolated, and
S
( ji
, ) are original pixels S1 ~ S16,W
( ji
, ) are weights of S1 ~ S16. P is calculated by summing the products of S1 ~ S16 with their weights and it can be expressed as (2.9).2.3 Defects of Traditional Interpolation Schemes
The image quality will change if we use different schemes to do interpolation, so we will discuss the defects of nearest neighborhood interpolation scheme and bilinear interpolation scheme in this section. The common defects in interpolated images are as below.
(1) Blur
If blur occurs in an image, it will be hard to focus or identify the object in an image, and the quality of image is reduced. The occurring of blur is because that the reconstruction filter is usually a low pass filter, and low pass filter will filter high frequency parts of signal. In image signal, high frequency parts are usually in edge section. Therefore, if we filter high pass parts of image signal, the information of edge section will be lost. We use bilinear interpolation as an illustration, Fig. 2-10 (b) is an image enlarged by bilinear interpolation, and we can find blurry defects in this image.
(2) Block & Jag
In real interpolation application, data is sampled in finite quantity, and the pixel of sampling point will be affected by near-by pixels and this cause jag appearing in edge sections of image. The effect of block and jag will be more obviously if the time domain waveform curve of interpolation function is more oblique. We use nearest neighborhood interpolation as an illustration, Fig. 2-10 (a) is an image enlarged by nearest neighborhood interpolation, and it contains obvious block and jag in edge section.
Fig. 2-10. (a) Image enlarged by nearest neighborhood interpolation, (b) Image enlarged by bilinear interpolation, (c) original image.
2.4 HVS-Based Edge-Adaptive Image Scaling
2.4.1 Introduction
It has been known that conventional interpolation techniques such as the bilinear and the bicubic interpolations do not satisfy the requirement of high quality image since these methods tends to some obvious defects such as blur and jag. Recently, several adaptive nonlinear methods have been proposed to tackle these problems. In these methods, the image is analyzed at first to achieve better interpolation quality [2]-[5].
Since human eyes are more sensitive to the edge areas than smooth areas within an image, many algorithms [6]-[16] have been proposed to improve the subjectively visual quality of edge regions in the images that need to apply interpolation. However, how to design the optimal-adaptive filter and how to judge the quality of interpolated images are the challenges for these methods.
To avoid the disadvantage of conventional interpolation, this thesis use HVS-based edge-adaptive image scaling scheme [1]. This image interpolation method combines the bilinear interpolation and an edge-adaptive image interpolation. By using a fuzzy decision system [17], [18] inspired by the human visual system to classify the input image into human perception non-sensitive regions and sensitive regions to determine either the bilinear interpolation module or edge-adaptive interpolation module is selected to operate for each region.
2.4.2 Architecture
The schematic block diagram of HVS-based edge-adaptive image scaling scheme is shown in Fig. 2-11. This scheme consists of a fuzzy decision module, an angle evaluation module, an edge-adaptive interpolation module and a bilinear interpolation module. Fuzzy decision module designates each sliding block as shown in Fig. 2-12; it receives for one of a plurality of predefined classifications. Based on this classification, one of the bilinear interpolation and edge-adaptive interpolation modules is selected for actuation in generating the supplementary image pixels necessary to support resolution enhancement. When the edge-adaptive interpolation is actuated, the angle evaluation module will compute the dominant orientation of the sliding block in the original images and use it as one of the input data of edge-adaptive interpolation module.
Fig. 2-11. Schematic block diagram of HVS-Based Edge-Adapted Image Scaling system.
(a) (b)
Fig. 2-12. (a) A 4 x 4 sliding (overlapping) block in original image, (b) The block after two times interpolation. [1]
When an original image enters our system, it is firstly divided into 4 x 4 sliding (overlapping) blocks. In other words, an image is constructed by many overlapped sliding blocks. The block is shown in Fig. 3-2 as an illustration, where
O
( ji
, ) are the pixels of the original image and P(1,0), P(0,-1), P(1,-1) are the pixels needed to be interpolated in current block for 2 times interpolation. The weighted interpolation is used and can be presented as (2.10),∑∑
= = it will be introduced in later section.2.4.3 HVS-Directed Image Analysis
It is well known that classical linear interpolation techniques often suffer from blurring edges or introducing artifacts around edges. An ideal interpolation scheme should always go alone the edge orientation because it would not blur the edge and well preserve the smoothness along the edge orientation.
In order to achieve optimal edge-directed interpolation, we make use of the properties of the human visual system to be our foundation by which we procure the futures of images. We could also realize which region would be worth processing for us in particular by using the properties of HVS, since human eyes would be usually more sensitive to this region.
1. Fuzzy Decision
Researches have been made on the characteristics of human visual system (HVS).
It was found that the perception of HVS is more sensitive to luminance contrast rather than uniform brightness. The ability of human eyes to tell the magnitude difference between an object and its background depends on the average value of background luminance. As shown in Fig. 2-13, visibility threshold is lower when the background luminance is within the interval from 70 to 150, and the visibility threshold will increase if the background luminance becomes brighter or darker away from this interval. In addition, high visibility threshold will occur when the background luminance is in very dark region [19].
Fig. 2-13. Visibility thresholds corresponding to different background luminance. [1]
In addition to the magnitude difference between object and the background, different structures of images also cause different perceptions for HVS. Human eyes are more sensitive to high contrast regions such as texture or edge regions than the smooth regions. Since we have to make a balance between image quality, processing speed and power consumption, in our interpolation method, a novel fuzzy decision system inspired by HVS is used to classify the input image into human perception non-sensitive regions and sensitive regions. For non-sensitive regions, the bilinear interpolation module is used to reduce the power consumption. For sensitive regions, the edge-adaptive interpolation module is used to archive better visual quality.
There are three input variables in our fuzzy decision system, visibility degree (VD), structure degree (SD) and complexity degree (CD). VD is used to check if the object in the sliding block can be easily seen by human eyes, SD and CD are used to
check if image in sliding block have characteristic of edge structure. The fuzzy decision system has an output that which interpolation module should be used. We will introduce fuzzy decision system in detail in fellow five sub-sections.
(1) Visibility degree
In order to obtain the input variables corresponding to each sliding block, two index parameters should be calculated at first. Parameter background luminance (BL) [19] is the average luminance of the sliding block and can be calculated by (2.12).
BL=
∑∑
Parameter D is the difference between the maximum pixel value and the minimum pixel value in the sliding block and can be calculated by (2.13).
))
A nonlinear function V(BL) is used to approximate the relation between the
BL
BL
e
e BL
V
( )=20.66 −0.03 + 0.008 . (2.14)After BL, D and V(BL) are obtained, we can calculate the input variables (VD, SD and CD) of the fuzzy decision system. Parameter VD is defined as the difference between D and V(BL) and can be represented as (2.15).
) (BL
V D
VD
= − . (2.15)If VD > 0, it means the magnitude difference between the object and its background exceeds the visibility threshold and the object is sensible. Otherwise, this object is not sensible.
(2) Structure degree
SD shows if the sliding block is a high contrast region and the pixels in the block can be obviously separated into two clusters. It is calculated by (2.16).
[ ]
An illustration of (2.16) is shown in Fig. 2-14. According to Fig. 3-4, if we make
σ
=max(O(i, j))−mean(O(i, j)),σ
=mean(O(i, j))−min(O(i, j)), (2.16) canbe expressed by (2.18).
) ( 1 2
2 1
σ σ
σ σ
+
− . (2.18)
Fig. 2-14. An illustration of the relation between SD parameter and the distribution of pixels in a sliding block.
So the SD can be normalized to [0, 1]. If SD is small (close to zero) and
σ
1 andσ
2 are close [as Fig. 2-14(a)], it means that pixels in the block can be separated into two even clusters. The block may contain edge or texture structure. On the contrary, if SD is a large value (σ1 −σ2 >>0) [as Fig. 2-14(b)], it means that pixel number of one cluster and that of the other cluster are not even; thus, the block may contain noise.(3) Complexity degree
When we get SD variable of current sliding block, if value of SD is small, sliding
calculated by (2.19). image intensity. Each pixel in the 4 x 4 sliding block takes the four-directional local gradient operation, and the CD is the summation of the 16 local gradient values. If the CD is a large value, it means the block may contain texture structure. On the contrary, if the CD is a small value, the block may contain delineated edge structure.
(4) Fuzzy output
Variable VD has two fuzzy sets, N(negative) and P(positive), variable SD has three fuzzy sets, S(small), M(medium) and B(big), variable CD also has three fuzzy sets, S(small), M(medium) and B(big). The membership functions corresponding to the VD, SD, and CD are shown in Fig. 2-15 (a)–(c), respectively. Seven fuzzy decision rules are used in fuzzy decision system and represented as follows:
1. If VD is N then Mo is BL.
When the output of fuzzy decision system is AA, system will choose edge-adaptive interpolation module, otherwise the bilinear interpolation module would be used.
Fig. 2-15. Membership functions of fuzzy sets on input variables VD, SD, and CD.
(5) Experiment of Fuzzy Input Variables
Fig. 2-16 shows four different image structures to illustrate the operations of the fuzzy decision system. Fig. 2-16 (a)–(d), represents smooth, texture, edge, and noise regions, respectively. The VD, SD, and CD values of these regions are shown in Table 2-1.
According to the VD values in Table 3-1, only Fig. 2-16 (a) (smooth region) is negative, which activates fuzzy rule 1 and follows the assumption that “if VD >0, it
activates fuzzy rule 2 and follows the assumption that “If SD is a large value, the block may contain noise." The SD values of Fig. 2-16 (b) (texture region) and Fig.
2-16 (c) (edge region) are small (S), which follows our assumption that “if SD is small, the block may contain edge or texture structure. The CD value of Fig. 2-16 (b) is medium (M), which activates fuzzy rules 5 and it follows the assumption that “If CD is a large value, the block may contain texture structure." The CD value of Fig.
2-16 (c) is small (S), which activates fuzzy rule 4 and follows the assumption that “If CD is a small value, the block may contain edge structure."
Fig. 2-16. Portions of (a) smooth region, (b) texture region, (c) edge region, and (d) noise region.
Table 2-1. Processing results of the fuzzy decision system corresponding to three different structures shown in Fig. 2-16.
Smooth
2. Angle Evaluation
According to Fig. 2-11, when fuzzy decision module selects edge-adaptive interpolation module, angle evaluation is performed to determine the dominate orientation of the sliding block. The flow diagram of angle evaluation is shown in Fig.
2-17. When angle evaluation is operating, the orientation angle of each neighborhood original image pixel is computed. According to Fig. 2-12, when the orientation angle of
O
( ji
, ) denoted asA
( ji
, ) is computed, the luminance values of the original pixels nearbyO
( ji
, ) are used for the following computations as (2.20)-(2.21), and we use Fig. 3-8 to illustrate (2.20) and (2.21).))
Fig. 2-17. Flow diagram of angle evaluation.
Fig. 2-18. Illustration of Dx and Dy in sliding block.
The obtained orientation angle of each neighborhood original image pixel is quantized into eight quantization sectors such as θ =22.5×k degrees, where k=0, 1… 7. The system will gather the most frequently occurring quantized orientation and send into edge-adaptive interpolation module.
2.4.4 Image Interpolation Computation
If the output of fuzzy decision module is BL, it means that the input image is not edge structure, so we just simply use bilinear interpolation to interpolate the image.
On the contrary, when fuzzy decision outputs AA, it means that current image input has edge structure, and needed to be interpolated by edge-adaptive interpolation module. Angle evaluation module will calculate the dominate orientation of the input image, and send the orientation information to edge-adaptive module for weight generation. In (2.10),
W
θ,m,n(i
,j
) are the weights corresponding to the orientation of input image. Weights will differ from different orientation and the location that needed to be interpolated. Weighting matrixW
θ,m,n(i
,j
) can be represented as follow:. (2.23)
Three interpolated pixels will be generated from one sliding block, generating one interpolated pixel need 16 weights, so a sliding block need 48 weights to complete interpolation of three pixels. To reduce system complexity, all weights are pre-trained by back propagation neural network and saved in a table.
Chapter 3
Hardware Design of Edge-Adaptive Real-Time Image Scaling
3.1 Introduction
With the development of DTV broadcast and LCD HDTV, high resolution image can provide much more detailed information to satisfy the needs of users. Since the traditional video does not have the same resolution as HDTV, the images are needed to be scaled. For instance, the source signal resolutions are in traditional standard resolution of 720 x 480, but the HDTV LCD panel usually has higher resolution like 1920 x 1080. If the video in SDTV resolution were played on HDTV, the resolution of the source signal should be enhanced in advance to fit the panel resolution of HDTV.
Due to the data stream of video signal is quite large, we need to do lots of computation if we want to enhance the resolution of video signal by using advanced
Due to the data stream of video signal is quite large, we need to do lots of computation if we want to enhance the resolution of video signal by using advanced