Chapter 3. Proposed Method
3.5. Linear Combination
3.5.2. Data Driven Combination
where the numerator terms stand for the three features respectively. The step described above is quite simple. Moreover, since the three maps have already been clamped to the same criterion at the stage of competitive normalization, this combination process requires almost no comput ational effort.
3.5.2. D ATA D RIVEN C OMBINATION
However, for an input image, what really determines the salient region might be only intensity, or colors. As for colors, some regions might be salient in the red/green channel, while others might be in the blue/yellow channel. Hence, we may choose an adaptive combination that changes the weight according to the image characteristics.
Here, we perform a data-driven approach and the summation is based on the following formula: I-channel, RG-channel, and BY-channel, respectively. Max_Ihist, max_RGhist, and max_BYhist denote the largest peaks in the corresponding 3-D histograms. Typically, if a channel possesses a large peak in its feature-pair distribution, that channel is dominated by a specific feature value and the “unusual” regions usually become more apparent in the conspicuity map. Hence, we assign a larger weight for this channel.
Chapter 4.
E XPERIMENTAL R ESULTS
Both computer simulation and subjective experiment were performed to verify the performance of the proposed algorithm. In computer simulation, the proposed algorithm is coded in Matlab without code optimization, and is tested over a PC with Intel® Core™2 Duo CPU running at 3G Hz. In the subjective experiment, an eye tracker is used to record the eye fixation points of 20 subjects in viewing 30 sample images. Figure 4-1 shows the eye tracker which borrowed from Prof. Chen-Chao Tao of Department of Commutation and Technology, NCTU. As we can see from Figure 4-1, the eye tracker looks just like a normal LCD monitor. At the bottom of the monitor, there are infrared emitters and sensors. The eye tracker use infrared and near-infrared non-collimated light to create a corneal reflection (CR). By detecting the strong reflectance from the observer’s pupils, the eye track may determine the observer’s eyes and then deduce the gaze focus of the eyes. [15].
Figure 4-1 The eye tracker we used for the experiment
The subjects include both men and women. At the beginning of the experiment, all subjects were asked to sit comfortably on a chair and to glance freely at the popped out image. The distance between the subject and the screen is about 50 to 70 cm. Each image was shown only for 3 seconds to get the intuitive eye movement without concerning the internal state of each person. Between images, there was a 3-second short break. The eye fixation data of all these 20 subjects were averaged and compared with the results of computer simulation.
Figure 4-2 Eye fixation experimental settings
The computer simulation results of our technique are compared with human eye fixation data, which are extracted by averaging the eye fixation data of 20 subjects, together with the simulation results of two other algorithms. This comparison is to verify whether our method has the same, or even better performance if compared with other methods mentioned in Chapter 2. In the computer simulation, the parameter settings of our algorithm are listed in Table 4-1.
Table 4-1 Test images and its parameter setting.
Scale Quantization Stop condition
Edge condition
Execution time
IMG – 1 2 (91 × 61) 25 30 ±3 3.75 s
IMG – 2 2 (92 × 61) 25 30 ±3 2.3 s
IMG – 3 2 (160 × 120) 25 30 ±3 15.78 s
IMG – 4 2 (96 × 64) 25 30 ±3 3.96 s
IMG – 5 3 (50 × 31) 25 50 ±3 0.36 s
IMG – 6 2 (128 × 96) 25 30 ±3 7.66 s
Comparison – 1 2 (100 × 62) 15 15 ±5 2.66 s
Comparison – 2 1 (189 × 150) 15 5 ±3 18.25 s
Comparison – 3 4 (80 × 77) 25 30 ±5 4.52 s
Comparison – 4 2 (100 × 75) 25 30 ±5 6.53 s
Figure 4-3 shows a sample input image and its three conspicuity maps, which are intensity in Figure 4-3(b), RG in Figure 4-3(c), and BY in Figure 4-3(d). From this three maps, we can see that the intuitive salient objects, the aircrafts, are popped out in Figure 4-3(b) and (c).
(a) Input image (b) Intensity conspicuity map
(c) RG color conspicuity map (d) BY color conspicuity map Figure 4-3 A sample input image and its three conspicuity maps
After obtaining the three conspicuity maps as in Figure 4-3, the two combination strategies are used in order to see the difference between other. Figure 4-4(a) shows the resulting saliency map which is formed by the naïve combination; whereas Figure 4-4(b) is the result of the data-driven combination. From these saliency maps, the naïve combination yields more popped-out regions compared to the data-driven approach. In Figure 4-4(a), some unwanted regions appear which can be considered as noise interference. In Figure 4-4(b), the output map is more reliable and closer to the human eye fixation heat map.
(a) Naïve combination (b) Data driven combination
(c) Heat map from 20 subjects
Figure 4-4 Resulting saliency maps of Figure 4-3 and heat map of human fixation (IMG - 1)
Another clear result that shows the superiority of the data-driven combination is as follows.
(a) Input image
(b) Naïve combination (c) Data driven combination Figure 4-5 A more specific result explains the combination stage (IMG - 2).
Based on the above discussion about the combination process, we thus use the data-driven method as the combination strategy in our saliency map detector. Figure 4-6 to Figure 4-9 show the experimental results for a few nature images. The human eye fixation heat map is presented for comparison.
(a) Input image (b) Saliency map
(c) Heat map
Figure 4-6 Experimental results of natural image (IMG - 3)
In Figure 4-7, which contains faces, the saliency map indeed pops these two faces out. The result is consistent with the human eye fixation result, which indicates that human faces would always be the visual saliency regions.
(a) Input image (b) Saliency map
(c) Heat map
Figure 4-7 Experimental results of image containing faces (IMG - 4)
(a) Input image (b) Saliency map
(c) Heat map
Figure 4-8 Experimental results of natural image (IMG - 5)
(a) Input image (b) Saliency map
(c) Heat map
Figure 4-9 Experimental results of natural image (IMG - 6)
In Figure 4-10 to Figure 4-13, we show the performance comparison of our method with respect to the subjective experiment, Itti’s method [4], and the Spectral Residual method [7], over four different images. The upper left image is the original image. The upper right image is the averaged eye fixation data, averaged from 20 subjects, with the red color indicating the visually salient regions. The detection results of the Itti’s method, the SR method, and our method are shown in parallel for comparison. It can be seen that the proposed method outperforms both Itti’s method and the SR method in these four cases. The results generated by Itti’s method are somewhat different from the eye fixation data, while the results generated by the SR method are more like the results of edge detection. Moreover, the computation complexity of the proposed method is much lighter than that of Itti’s method.
Figure 4-10 Experimental results of comparisons with other methods (comparison – 1)
Figure 4-11 Experimental results of comparisons with other methods (comparison – 2)
Figure 4-12 Experimental results of comparisons with other methods (comparison – 3)
Figure 4-13 Experimental results of comparisons with other methods (comparison – 4)
Chapter 5.
C ONCLUSIONS
In this thesis, we proposed a bottom-up feature-based technique for saliency region detection. The whole process is simple and doesn’t require the training stage.
For system activation, we extract the feature-pair distribution from low-level image data. We assign proper weights over the feature-pair distribution to identify visually salient regions. The proposed algorithm is much simpler than the commonly used Itti’s method. After the activation process, a normalization process based on spatial competition is applied to the conspicuity maps to enhance signal-to-noise ratio. The conspicuity maps from different feature channels are then linearly combined in a data-driven manner. The experiment results show that the proposed algorithm can faithfully detect the salient regions for various kinds of images and the detection results are consistent with subjective observations.
R EFERENCES
[1] E. Niebur and C. Koch, "Control of selective visual attention: Modeling the ''where'' pathway," 9th Annual Conference on Neural Information Processing Systems (NIPS), pp. 802-808, 1996.
[2] J. Harel, C. Koch, and P. Perona, "Graph-Based Visual Saliency", Proceedings of Neural Information Processing Systems (NIPS), 2006.
[3] C. Koch and S. Ullman, "Shifts in Selective Visual-Attention - Towards the Underlying Neural Circuitry," Human Neurobiology, vol. 4, pp. 219-227, 1985.
[4] L. Itti, C. Koch and E. Niebur, "A model of saliency-based visual attention for rapid scene analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, pp. 1254-1259, 1998.
[5] R. A. Rensink, "Seeing, sensing, and scrutinizing," Vision Research, vol. 40, pp.
1469-1487, 2000.
[6] D. Walther and C. Koch, "Modeling attention to salient proto-objects," Neural Networks, vol. 19, pp. 1395-1407, Nov. 2006.
[7] X. Hou and L. Zhang, "Saliency detection: A spectral residual approach," IEEE Conference on Computer Vision and Pattern Recognition, pp. 2280-2287, 2007.
[8] C. Guo, Q. Ma and L. Zhang, "Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform," IEEE Conference on Computer Vision and Pattern Recognition, pp. 2908-2915, 2008.
[9] A.G. Leventhal, "The neural basis of visual function," Vision and visual dysfunction, Boca Raton, CRC Press, vol. 4, 1991.
[10] L. Itti, "Models of Bottom-Up and Top-Down Visual Attention," California Institute of Technology, Jan. 2000. [Ph.D. Thesis]
[11] L. Itti and C. Koch, "A comparison of feature combination strategies for saliency-based visual attention systems," Conference on Human Vision and Electronic Imaging IV, pp. 473-482, 1999.
[12] T.C. Jen, B. Hsieh and S.J. Wang, "Image contrast enhancement based on intensity-pair distribution," IEEE International Conference on Image Processing, vol. 1, pp. I-913-16, Sep. 2005.
[13] S. Engel, X. M. Zhang and B. Wandell, "Colour tuning in human visual cortex measured with functional magnetic resonance imaging," Nature, vol. 388, pp.
68-71, Jul. 1997.
[14] L. Itti and C. Koch, "A saliency-based search mechanism for overt and covert shifts of visual attention," Vision Research, vol. 40, pp. 1489-1506, 2000.
[15] http://en.wikipedia.org/wiki/Eye_tracking