4. Proposed Rate Control Framework
4.2. Analysis of Visual Complexity
4.2.2. Spatial Contrast Sensitivity
Human visual system has different sensitivity to different spatial frequencies, and the operation is like a lowpass or slightly bandpass filter. The contrast sensitivity function describes the sensitivity of HVS to different frequencies. Based on numerous experiments [34], the peak frequency in a CSF is generally between 3 cycles/degree and 8 cycles/degree, and the sensitivity in high frequencies data decreases rapidly.
When a CSF filter is applied to an image, a portion of high frequency information that
is not detectable by human visual system would be filtered out. The filtered information can be used to estimate the visual complexity of the video data.
The process of how CSF can be applied to an image is presented in [1]. The first step is to normalize all luminance values in the image by the mean luminance.
Because visual perception of lightness is a nonlinear function of luminance, the cube root of the normalized luminance is taken over the entire image. Since a CSF is easier to describe in frequency domain, the Fourier transform F(u, v) of the image f is computed first. The signals in frequency domain are then filtered by a CSF as follows:
)) , ( ( ) , ( ) ,
(u v F u v CSF r u v
G = ⋅ . (47)
The inverse Fourier transform of G(u, v) is the filtered image.
In most applications of CSF, it is used for quality assessment by computing the difference between the two CSF-filtered input images. For video coding purposes, we are more concerned about the amount of data which is filtered out by the CSF. An area with lots of filtered data means that there is more visually undetectable information.
Therefore, the amount of filtered information is estimated by computing the absolute difference between images before and after the filtering process. The estimation of visual complexity will be incorporated into the rate control of video compression.
Since the smallest coding unit for rate control is a macroblock, the filtered information will be computed on a macroblock basis. MAD is computed to evaluate the average amount of the dropped data. The area with more data dropped is less important. However, the evaluated result is highly correlated to the original luminance and not accurate. Figure 10 illustrates this phenomenon. Figure 10(a) is the original frame from the Stefan sequence. Figure 10(b) is the MAD that represents the merged information for one macroblock. It is quite obvious that the result is similar to the
original mean luminance of Figure 10(a). Figure 10(c) compares the MAD and the mean luminance of the original frame. Both data sets are scaled to the range of [0, 1].
(a) (b)
(c)
Figure 10. CSF filtering of images
Possible reasons for this phenomenon can be discusses as follows. A CSF is a bandpass filter that suppresses certain low and high frequency components in the
transform domain. The result of MAD strongly represents the effect caused by low frequency domain components of contrast sensitivity filter, and the suppression of high frequency components is comparatively vanished. For example, Mannos-Sarkrison CSF and Ahumada CSF are set to 0.3 at zero frequency while the range of sensitivity is from 0 to 1. These CSFs filter out most information of the mean luminance of entire image. Since low frequency components in Fourier transform domain is responsible for the general gray-level appearance over smooth areas, the attenuation of low frequency by CSF filter causes too much luminance information to be removed. Therefore, the difference between images obtained before and after applying a CSF filter highly depends on the luminance of the image before filtering.
Additionally, the decrease in sensitivity at low frequency is slow since imagery of test stimuli is not stabilized on the retina [26]. Most experiments only consider spatial frequency higher than 1 cycle/degree. A CSF is usually designed to be an easily constructed model which is usually a continuous function on [0, ]. However, the mathematical model at zero frequency is not applicable. A different aspect of this phenomenon can be explained as follows. The lowest frequency in entire image should be one cycle per image. For example, if an image subtends a visual angle of V degrees, the lowest spatial frequency should be 1/V cycles/degree. This aspect also explains why the CSF near zero is not evident.
∞
In previous research, most visual based video applications which apply contrast sensitivity model are quality assessment systems [10][15]. Both reference signal and distorted signal pass through CSF filtering stage and other error sensitivity model, and then the difference between two signals after processing are measured. Under this condition, the effect of CSF in low frequency described above influences two comparative signals, and the side effect of CSF filter could be counteract after
computing difference between two signals. But in our application, the filtered signal is compared with the original signal, so the inaccuracy of mathematical contract sensitivity model in low frequency influences significantly.
Due to the above-mentioned analysis, the modified CSF filter would be more suitable for visual complexity analysis in video coding. The modified CSF is designed in the form of difference of Gaussians which is similar to the Ahumada CSF. Filters based on Gaussian function are particularly important because the forward and inverse Fourier transforms of Gaussian function are real Gaussian function without considering imaginary component. The visual characteristic that contrast sensitivity rapidly rolls off at high frequency is verified by nearly every experiments and proposed models. The modified CSF mainly adjusts the low-frequency part. The modified CSF is
The graph in Figure 11 shows a plot of the modified CSF compared with other contrast sensitivity functions. The function at frequency larger than 2 cycles/degree is similar to other CSFs. The function at low frequency near 0 cycles/degree is adjusted for the aforementioned reasons in our applications.
100 101 102
Daly CSF (limit I) Daly CSF (limit II) Modified CSF
Figure 11. Comparisons of the proposed CSF with other CSFs
So far, the proper CSF filter for video coding purposes has been established. The next step is to compute the merged block information using the modified CSF. We propose a new parameter called distortion tolerance for the rate control model. A large value of this parameter means that the corresponding area can tolerate large distortion.
The amount of undetectable information is estimated by computing the absolute difference between images before filtering and after filtering:
∑
−where P is the size of the macroblock, it is normally set to 16*16, f(x, y) and g(x, y) are images before filtering and after filtering in spatial domain, respectively. In order
to enhance the difference between the important area and the unimportant area, the distortion tolerance would be set only in the MB where the parameter value is higher than average, and the parameter of other MB would be set to 0:
∑
== Q
n
n DT DT
1
*( )
k else DT =⎩⎨
, ) 0
( ⎧DT*(k), ifDT*(k)≥DT
(50) ,
where DT(k) is the distortion tolerance of macroblock k, Q is the number of macroblocks, and it is set to 22*18 for CIF sequence. The distortion parameters of one frame would be normalized to the range [0, 10]. The result for frame no. 1 of the Stefan sequence is showed in Figure 12
Figure 12. result of visual complexity computation