Intrinsic Image Decomposition - Literature Review

1.2 Literature Review

1.2.3 Intrinsic Image Decomposition

In many computer vision applications, it is desirable that the reflectance and illumination components be decomposed from the input image. Both components have their own advantages. Since the reflectance component is related to the scene characteristic, the reflectance image in principle remains constant under different illumination conditions. For applications such as object recognition [CHA 04], pattern classification [LEU 99], scene interpretation [FAN 03], and visual surveillance [KAM

Chapter 1 Introduction

1-13 00, MAT 03, TER 99], it is preferable to use reflectance images. The illumination component varying with different lighting conditions can be used for tasks such as illumination assessment [DRE 03, FIN 06], shading analysis [BEL 01, FUN 92, PAR 03], color constancy [FIN 01], and geometric modeling [SHA 03].

Decomposing an image into its reflectance and illumination components is an ill-posed problem [BAR 78]. There are two unknowns (illumination and reflectance components) that are to be derived from one given data (the input image). Additional information is needed to separate the components. Weiss [WEI 01] used multiple images. Let I (i = 1,..., n) be a set of images taken of a scene under different _i illumination conditions. Since the reflectance component is assumed to be constant, say R, a set of n equations, I_i = × (i = 1, ... , n), can be constructed, where R L_i L is the _i illumination component of image I . However, this set of n equations is still not _i enough to solve for the n + 1 unknowns (R and L ). Weiss further introduced a _i sparseness assumption [SIM 97], which states that the filtered images obtained by applying gradient operators to the input images are sparse (i.e., they contain mostly zeros) so that the histograms of the filtered images can be fitted with a Laplacian function. With this assumption, the decomposition problem becomes solvable. Weiss then estimated R and L using a maximum likelihood technique. Yuille et al. [YUI 99] _i also used as the input data a set of images taken of an object under different and unknown lighting conditions. A singular value decomposition technique was applied to the images to separate the images into components depending on surface characteristics (geometry and albedo) and illumination conditions. Based on the extracted surface characteristics, a generative model of the object [GEO 00], which approximates the object’s appearance under a restricted range of illumination conditions, was determined.

Characteristic image decomposition from a single image

Since multiple images were used by Weiss and Yuille et al., the applicability of their techniques is somewhat limited. Tappen et al. [TAP 02] proposed a method for recovering intrinsic components from a single image. A set of derivative filters are first applied to the input image giving rise to a set of derivative images. The pixels of the derivative images are classified as being reflectance-related or illumination-related based on their color and intensity. However, unsatisfactory results were observed.

Tappen introduced a process, called the generalized belief propagation process, to improve the results. Thereupon, a de-convolution process was applied to the classified derivative images to obtain the intrinsic images of the input image. The Tappen method took about six minutes to categorize pixels and another six minutes to perform the generalized belief propagation process. To use with real-time applications, the time complexity of the Tappen method should be reduced.

Recently, Matsushita et al. [MAT 03] introduced an illumination eigenspace into Tappen’s computational framework. The eigenspace, which is built in advance, provides information for categorizing the pixels of derivative images. Since no information is computed during pixel classification, the Matsushita approach can operate in real time.

However, over a period of 120 days Matsushita collected a set of 2048 images from a scene for generating its illumination eigenspace. Apparently, Matsushita’s method can not be applied to time-varying scenes (i.e., dynamic scenes). To be applicable to dynamic scenes, the information for classifying pixels of derivative images must be computed directly from the input image, and for real-time applications, the computation should be efficient.

Chapter 1 Introduction

1-15 1.3 The Proposed Techniques

An image usually consists of millions of pixels of different intensities (R, G, and B color values). The meaning of an image is not simply the intensity of pixels, but rather in recognizable groups of pixels which humans see as one entity. This is supported by the theory of the human recognition of objects, recognition-by-components [HOF 84, BIE 87]. The perceptual recognition of objects is conceived of as a process where an input image is segmented into an arrangement of simple geometric components. The recognition-by-components theory is based on how people recognize objects in an image by decomposing their shapes into basic geometrical parts. It also answers why people can recognize objects even when the intensity changes considerable from its original value.

Several modifications to the image in Fig. 1.3(a) are shown in Fig. 1.3(b) - 1.3(h).

In all cases people can still easily recognize Mona Lisa. Note that the topographical contour reproduces the look of a topographical map by covering the image with contour lines. All of them have different intensities of pixels, and even the sizes and scales of images are not the same. However, humans have no problem recognizing them as Mona Lisa. This can be explained by the recognition-by-components theory, and if the major components of Mona Lisa (i.e., the face, and eyes) are kept in the modified images, no matter how the modifications are applied on the image, the Mona Lisa can be still recognized. From all the modifications above, people can see that edges represent most of the meaning of the picture as in Fig. 1.3(f) and (g).

Characteristic image decomposition from a single image

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 1.3. Mona Lisa, (a) original picture, (b) gray scale image, (c) pentagon distortion image, (d) partial image, (e) brush stroke image, (f) line drawing by hand [ENC 07], (g) topographical contour image, (h) edge image.

An edge is defined as an imaginary line that separates two regions with different characteristics, e.g. intensity, color, illumination, reflectance, specular, and diffuse, etc.

Accordingly, edge detection techniques are an essential technique for computer vision, and edge detection methods can be categorized [SAL 07] as spatial domain, frequency domain, and multi-resolution analysis. Classical spatial domain techniques try to find the maximum gradient values to detect edges (Sobel, Prewitt, Canny [CAN 86], Roberts)

Chapter 1 Introduction

1-17 or the zero crossings of the second derivative (Laplacian of Gaussian). Edge information exists in the high frequency area of the Fourier transform of an image.

Fourier analysis reveals global information about edges, and it is not suitable to detect specific local edges. Multi-resolution analysis uses hierarchical structures to detect edges in different resolutions. Saleem et. al. [SAL 07] implemented a multi-resolution edge detector using Wavelet transform, which are less sensitive to noise since noise edges do not appear at low resolutions. Edges in a blurry image [WU 07] can also be better detected using multi-resolution approach. However, weak or isolated edges may be treated as noise and ignored. In this dissertation, all the edges in the image are required to be classified to compose the characteristic components, and for this a traditional edge gradient mask is the most suitable.

In this dissertation, based on the idea that edges are the key to recognizing the meaning of images, a computational framework is developed for characteristic image decomposition from a single image using environmental supporting information to classify the edges of the image. The classified edges are separated into desired components, and each component is integrated to form the target characteristic image.

Three applications of this computational framework (interference reflections, highlight reflections, and intrinsic images) are developed in this dissertation.

For interference reflections, a technique for separating reflection and object components of a single interference image in an automated manner is presented. The key idea of the proposed method is to classify edges of the interference image into either reflection or object, and to use integration to reconstruct reflection and object images. The method utilizes TV model, blur measure, and region segmentation results as evidence with fuzzy integral technique to classify the edge pixels. Based on the

Characteristic image decomposition from a single image

classification results of edge pixels, an integration method is applied to reconstruct the reflection and object components of the input image. The experimental results have demonstrated that the proposed method can perform separation of a single interference image effectively with small misadjustments and rapid convergence.

For highlight reflections, a feature based technique for separating specular and diffuse components of a single image is presented. In the proposed approach, Shafer’s dichromatic reflection model is utilized, which assumed light reflected from a surface point is additively composed of diffuse and specular reflections. The idea behind the proposed method is to classify the boundary pixels of the input image as specular or diffuse. A fuzzy integral process is proposed to classify boundary pixels based on their local evidences, including specular and diffuse estimation information. Based on the classification result of boundary pixels, an integration method is used to reconstruct the specular and diffuse components of the input image. The experimental results demonstrate that the proposed method can perform dichromatic reflectance separation effectively with small misadjustments and rapid convergence.

For intrinsic images, the proposed approach first convolves an input image with a prescribed set of derivative filters. The pixels of the derivative images are classified as reflectance or illumination according to three measures: chromatic, intensity contrast and edge sharpness, which are calculated in advance for each pixel from the input image.

Finally, an integration process is applied to the classified derivative images to obtain the intrinsic images of the original image. Both synthetic and real images have been utilized in the experiments. The results reveal the feasibility of the proposed technique in rapidly and effectively decomposing intrinsic images from a single image.

Chapter 1 Introduction

1-19 1.4 Contributions

There are four major contributions from this dissertation:

(i) The computational framework for characteristic image decomposition from a single image

The proposed computational framework which performs characteristic image decomposition from a single image has four major steps: boundary generation, information extraction, boundary classification, and image composition. This framework can be applied to many valuable applications, some of which are addressed in this dissertation, such as interference reflection separation, dichromatic reflection decomposition, and intrinsic image extraction from a single image. The decomposed components are useful for making computer vision algorithms applicable to more general and realistic scenes.

(ii) The solution to the reflection image separation by interference images separation Interference images extraction can separate objects reflected by a glass cover and objects behind the glass cover in a single image. The proposed separation technique is valuable to vision applications since it can eliminate annoying reflections in an image. Unlike previous researches, the proposed method is fast, fully automatic, and requires no manual operations or iterative operations.

(iii) The solution to highlight reflection removal by dichromatic reflection image extraction

The dichromatic reflection image extraction algorithm can eliminate highlight reflections from the input image and help avoid errors caused by highlights in computer vision applications. Unlike previous research, the proposed method has no color segmentation or iterative operations.

Characteristic image decomposition from a single image

(iv) The solution to shadow removal by intrinsic image extraction

Intrinsic image extraction can extract the illumination and reflection images from a single input image. In many computer vision applications, it is desirable that the reflectance and illumination components be separated from the input image. The proposed algorithm is useful to in computer vision applications which are plagued by the illumination conditions. However, this task is not at all simple because it is an ill-conditioned problem. Previous researchers have relied on multiple images to solve this problem. But, the use of multiple images restricts the application domain. To overcome this limitation, the technique proposed in this dissertation uses a single image.

在文檔中單張影像之特質影像萃取 (頁 30-38)