• 沒有找到結果。

CHAPTER 1 INTRODUCTION

1.3 Organization of the Thesis

The thesis is organized as follows. In Chapter 1, motivation and some related work are given. In Chapter 2, the proposed classified image fusion method, which can combine the generated virtual images in an attempt to obtain an image which is well-exposed in every region. In Chapter 3, experimental results will be given to show the effectiveness of the proposed method. Finally, a brief conclusion and future work will be depicted in Chapter 4.

8

CHAPTER 2

PROPOSED CLASIFIED IMAGE FUSION METHOD FOR IMAGE CONTRAST ENHANCEMENT

In this chapter, we will describe the proposed classified image fusion (CIF) method for image contrast enhancement. Image fusion is the process that aims to extract relevant information from multiple images taken in the same scene and obtain a more informative picture with better contrast and image quality. Image fusion has numerous applications such as remote sensing [15-17], medical imaging [17], high dynamic range imaging (HDRI) [17], [18], multi-focus imaging [17], [19], etc. In remote sensing and medical imaging, several images captured from various sensors are given. Then, these images are fused to produce a high quality image. In HDRI, multiple images taken in the same scene with different exposure time are generated.

The image pixels having distinct luminance values are then fused to yield an image having wide dynamic range than each individual one. In multi-focus imaging, several images with each having some objects in focus will be merged to obtain an image in which all relevant objects are in focus. For these applications, several images with varying luminance, exposure, or focus, should be obtained in advance. However, it is not a simple task for digital cameras or mobile phones to capture several images of

9

the same scene with variant information. Therefore, an algorithm will be proposed to produce several virtual images for image fusion.

Since our proposed CIF method works on gray images, a color value to gray value conversion is first applied on each input color image. The luminance image Y(x, y) is converted from its original red, blue, green components using the following

where R(x, y), G(x, y), and B(x, y) denote red, green and blue color values of the pixel located at (x, y). Then, several virtual images having different intensities are generated.

In addition, a multilevel thresholding algorithm is employed to classify the input image pixels to different classes depending on their luminance values. By using the classification result, several relevant virtual images are selected among the generated virtual images. After the relevant virtual images are selected, the proposed classified image fusion algorithm, performed in discrete wavelet transform (DWT) domain, will be designed to obtain a fused image with proper exposure in every region. The flow chart of the proposed image contrast enhancement method is shown in Fig. 1.

10

2.1 Generation of Virtual Exposure Images

In image fusion, several images are combined to produce an output image having better quality. For image enhancement, only one input image is given to produce an output image with higher contrast. Therefore, it is necessary to design an algorithm to generate several virtual images such that image fusion technique can be applied for image contrast enhancement. In the proposed CIF method, the concept of exposure, which refers to how much light will reach the image sensors on image capturing devices, will be exploited to generate several virtual images having distinct luminance.

For digital cameras, shutter speed and F-stop are used to determine the exposure when taking photos. Shutter speed controls how long the shutter is open, which corresponds to the length of time the light can reach the image sensors. The larger the shutter speed, the more the amount of light reaching the image sensors. Another factor controlling the exposure of a photo is F-stop, which controls the size of the aperture.

The aperture is the hole the light of the scene passing through in the digital cameras.

Fig. 1 Flow chart of the proposed CIF system

11

Modern cameras use a standard F-stop scale: f/1, f/1.4, f/2, f/2.8, f/4, f/5.6, f/8, f/11, f/16, f/22, etc. The scale is an approximately geometric sequence that corresponds to

the sequence of the powers of the square of 2. In this sequence, each stop represents a halving of the light intensity from the previous stop. For example, f/1 allows twice as much light to fall on the image sensor than f/1.4, and four times as much light than f/2.

In this study, we exploit the F-stop concept to generate virtual images such that their pixel luminance values approximate a geometric sequence. In this thesis, the

where Y(x, y) is the gray level of input image pixel located at (x, y).

From Eq. (2), N brighter images (with k N,-N1,...,1) and N darker images (with k 1,2,...,N) compared to the input image Y are generated. Fig. 2 and Fig. 3 show some generated virtual exposure images. From Fig. 2, we can see that in those brighter virtual images, the detail of the central building becomes more apparent and the contrast is much sharper than that in the original image (with k = 0). However, the contrast in the sky region becomes less sharp in these brighter images. Fig. 3 shows a similar result. That is, the detail of the dark foreground objects becomes clearer in those brighter virtual images.

12

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 2 Generated virtual exposure images (a) k = -7 (b) k = -6 (c) k = -5 (d) k = -4 (e) k

= -3 (f) k = -2 (g) k = -1 (h) k = 0 (original image) (i) k = 1 (j) k = 2 (k) k = 3 (l) k = 4 (m) k = 5 (n) k = 6 (o) k = 7

13

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 3 Generated virtual exposure images (a) k = -7 (b) k = -6 (c) k = -5 (d) k = -4 (e) k

= -3 (f) k = -2 (g) k = -1 (h) k = 0 (original image) (i) k = 1 (j) k = 2 (k) k = 3 (l) k = 4 (m) k = 5 (n) k = 6 (o) k = 7

14

2.2 Image Pixel Classification

In the proposed CIF method, the input image pixels are classified to m classes.

Pixels in different classes will be blended with different image fusion rules. A multilevel thresholding algorithm designed by Liao et al. [20] is utilized to find m-1 thresholds, denoted by Thd1, Thd2, …, Thdm-1 (Thd1 < Thd2 < … < Thdm-1), to divide the input image pixels into m classes. Let Ω1, Ω2, …, Ωm denote these m classes, according to the m-1 thresholds, these classes can be defined as follows:

( , )| ( , ) 1

.

In order to determine the proper cluster number m, a metric called Dunn index (DI) [20] will be used to evaluate the classification results. DI was defined to get a clustering result having small within-class variance and a large between-class variance. The definition of DI for n classes is given by:

k

(between-class variance) and Δk is defined as follows:

).

15

variance). The higher DI indicates the better clustering result. Based on Eq. (6), the

cluster number m is determined by:

. max

arg

max min

m n n

m DI

m (8)

where mmin and mmax are the maximum and minimum cluster number to be examined.

To save computation time, we bound the cluster number in the range from 3 to 6. That is, set mmin = 3 and mmax = 6. This setting is based on the observation that typically an image has at least three classes: dark pixels, bright pixels, and pixels with luminance values in-between. Fig. 4 shows three input images and the corresponding image pixel classification results. The input image shown in Fig. 4(a) is classified to three classes as shown in Fig. 4(b): the sky region (the bright pixels), the background buildings (well-exposed pixels), and the front-central building (dark pixels). Fig. 4(c) shows an input image which is classified to four classes as shown in Fig. 4(d): the left side of the sky region, the right side of the sky region, the background mountain, and the dark houses. In Fig. 4(c), since the right side of the sky is visibly darker than the left side of the sky, they are separated to two classes. Similarly, the right side of the sky is noticeable brighter than the background mountain and the roof of the houses, they are also separated. Finally, the dark trees, parts of the houses and the wall are classified to the dark pixels. Fig. 4(f) shows the classification result of Fig. 4(e) which is classified into five classes.

16

Fig. 4 Input images and classification results. (a) Original gray image (b) Classification result with m = 3 (c) Original gray image (d) Classification result with m = 4 (e) Original gray image (f) Classification result with m = 5

17

2.3 Selection of Relevant Virtual Exposure Images

The previously generated 2N+1 images are not all used in the image fusion process. Among these 2N+1 virtual exposure images, only those images having some relevant informative regions will be chosen for image fusion. That is, those images which are completely under-exposed or completely over-exposed will not be used in the image fusion process in an attempt to yield a high informative fused image. To this end, an anchor image among these 2N+1 virtual exposure images will be selected first. For virtual image Yk, the trimmed mean luminance, denoted byk,of the image pixels belonging to clusters Ω2, Ω3, …, Ωm-1 is calculated:

That is, the pixels in the darkest class C1 and the brightest class Cm are not considered.

The image having a trimmed mean luminancekclosest to gray level 128 (the middle

value in the luminance range [0, 255]) will be selected as the anchor image:

. exposure images for image fusion. Fig. 5 and Fig. 6 show two examples of selected relevant virtual exposure images. Comparing these two figures with Fig. 2 and Fig. 3,

18

we can see that those dark virtual images which are less informative are excluded for image fusion process.

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 5 Selected relevant virtual exposure images (a) k = -9 (b) k = -8 (c) k = -7 (d) k = -6 (e) k = -5 (f) k = -4 (g) k = -3 (h) k = -2 (anchor image) (i) k = -1 (j) k = 0 (k) k = 1 (l) k = 2 (m) k = 3 (n) k = 4 (o) k = 5

19

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 6 Selected relevant virtual exposure images (a) k = -14 (b) k = -13 (c) k = -12 (d) k = -11 (e) k = -10 (f) k = -9 (g) k = -8 (h) k = -7 (anchor image) (i) k = -6 (j) k = -5 (k) k = -4 (l) k = -3 (m) k = -2 (n) k = -1 (o) k = 0

20

2.4 Classified Image Fusion

The 2M+1 virtual exposure images, denoted by Yk (k = anc-M, …, anc, …, anc+M), having different intensities are then blended by using the proposed classified

image fusion method. First, for every virtual exposure image Yk, a weighted map Wk

indicating the contribution of each pixel to the final fused image is computed. The weight maps consider the contrast and the well-exposedness as quality measures.

Since the image fusion process is conducted on luminance images, we do not consider the saturation measure as that described in [22]. The contrast measure is exploited to preserve the detail part such as edge or texture information in each image. Further, the just-noticeable-difference (JND) model of the human visual system (HVS) [25] is included in the contrast computation process to prevent from amplifying noises. The well-exposedness measure attempts to find the proper luminance value for each pixel.

The flow chart of the proposed classified image fusion method is shown in Fig. 7.

Fig. 7 Flow chart of the proposed classified image fusion method

21

2.4.1 Just-Noticeable-Difference (JND) Model of the Human Visual System (HVS)

JND determines the threshold of luminance difference that can be perceived by HVS. In this thesis, the JND model proposed by Chou and Li [25], determined by the average background intensity and the spatial non-uniformity, will be used for quality measure evaluation. The JND value of the image pixel located at (x, y) is defined as follows:

where J1 models the spatial masking effect and is defined by:

)).

where λ influences the visibility threshold due to spatial masking effect, bg(x, y) is the average background intensity computed by using the mask B (as shown in Fig. 8):

.

mg(x, y) is the maximum gradient value in four directions:

|}.

where gradk(x, y) is the weighted average gradient along the direction k:

22 Fig. 9 The gradient mask used in computing gradk(x, y)

J2 determines the luminance threshold due to background intensity and is defined as follows:

where T0 and γ depend on the viewing distance between the monitor and the tester,T0 denotes the visibility threshold when the background gray level is 0, and γ denotes the slope of the line that models the JND visibility threshold function at higher background luminance. In this thesis, we set T0 = 17, γ = 3/128, and λ = 1/2 as conducted by Chou and Li [25].

23 computed. In total, 2M+1 contrast maps are computed. Fig. 10 and Fig. 11 show the contrast maps computed from those virtual images shown in Fig. 5 and Fig. 6.

24

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 10 Computed contrast maps (a) C-9 (b) C-8 (c) C-7 (d) C-6 (e) C-5 (f) C-4 (g) C-3 (h) C-2 (i) C-1 (j) C0 (k) C1 (l) C2 (m) C3 (n) C4 (o) C5

25

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 11 Computed contrast maps (a) C-14 (b) C-13 (c) C-12 (d) C-11 (e) C-10 (f) C-9 (g) C-8

(h) C-7 (i) C-6 (j) C5 (k) C4 (l) C3 (m) C2 (n) C1 (o) C0

26

2.4.3 Well-exposedness Measure

Well-exposedness measure evaluates how well a pixel is exposed. Mertens et al.

[22] utilized a Gaussian distribution to model the exposedness of a pixel depending on how close its luminance is to the target luminance values 128 (the middle value of luminance range [0, 255]). That is, the pixels with luminance value closer to gray level 128 should have a larger weight while the pixels with luminance far away from 128 should have a smaller weight when computing the well-exposedness measure.

Generally, the well-exposedness measure is defined as follows [22]:

2 .

where Ek(x, y) is the well-exposedness value of the pixel located at (x, y), and σ is the standard deviation of the Gaussian distribution which is set as 0.2×255 (the luminance range). From Eq. (21), the well-exposedness value is bounded to be the range [0, 1].

Further, the pixel with luminance value closer to gray level 128 will be assigned a original luminance values. Then, distinct target luminance values are defined for

27

different classes. Intuitively, the target luminance values Yit (i = 1, 2, …, m) can be determined by finding the center of each equally-spaced interval (see Fig. 12 for an example with cluster number m = 3).

Fig. 12 The equally-distributed target luminance values Yit with m = 3.

This method assumes that the whole luminance range (256) is divided into m equally-spaced intervals having width R = 256/m. Further assuming that the luminance values in each class are Gaussian distributed with width equals R = 6σe where σe is the standard deviation of the Gaussian distribution. Thus, the target luminance value associated with class Ωi is

. 2) (i 1 R

Yit    (22)

However, the equally-spaced target luminance values, without considering property of the input image, should be adjusted to appropriate values. For example, if the input image is a dark one consisting of many dark pixels in Ω1, the target luminance value Y1t should be adjusted as well. Similarly, if the input image is a bright one consisting of many bright pixels in Ωm, the target luminance value Ymt should be adjusted in according with the number of pixels belonging to Ωm. Another approach, considering

28

the number of pixels in each class, will take into account the image property to find

target luminance values. Let pi denote the probability of the pixels belonging to class Ωi in the input image. Then, the cumulative probability Cum(i) for each class Ωi is

According to the probability of each class, the target luminance value is defined as the middle value in each target luminance range:

m

Further, the mean luminance value of the pixels belonging to the largest class is used to determine whether an input image is a dark one or a bright one. Let Ni denote the number of pixels belonging to class Ωi. The index of the largest class is defined as

follows:

Then the mean luminance value of the largest class no

max is computed:

29 However, if the luminance value of a pixel is near the boundary of two classes, it is hard to determine which class this pixel really belongs to. That is, it is hard to determine its target luminance value and thus it is impossible to correctly evaluate the appropriate exposedness value. Therefore, we exploit the concept of fuzzy clustering to determine the probability that a pixel belongs to a class. Let io denote in the input image the average luminance of those pixels belonging to cluster Ωi:

m

Then, the probability, computed as the likelihood that a pixel value is from each class, is modeled as a Gaussian function:

2 ,

where Y is the input image, σi is the standard deviation computed from the luminance values of those pixels having luminance values in the range [i 1o, i 1o ]. Note that

o

0 is set to 0 and m 1o is set to 255. Since the range [io1, i 1o ] is larger for those i

30

with 1im and thus the corresponding standard deviation is multiplied by 0.75 such that every σi is computed from similar range. Fig. 14 illustrates the above concept.

Fig. 14 The luminance range for determining σi with m = 3.

To utilize the fuzzy clustering concept, the well-exposedness value of the pixel in Yk

associated with class Ωi is defined as follows:

m

where Ek(x, y) denotes the well-exposedness value associated with the pixel located at (x, y) in virtual exposure image Yk. By applying different target luminance values to different classes, pixels will be adjusted toward different luminance value and thus the global contrast can be reserved. Fig. 15 and Fig. 16 show the well-exposedness maps produced by using the exposedness measure computed by using a single target luminance value 128 proposed by Mertens el al. [22] and the proposed classified

31

exposedness measure. From Fig. 15, by observing the sky region, we can see that those lower exposure dark images have larger weight values than those in the brighter images. As a result, the luminance of the sky region will decrease in the fused image.

From Fig. 16, however, by using the proposed method, E-2 has larger weight values in the sky region and can preserve the luminance much better than Mertens’s method.

Fig. 17 and Fig. 18 show another example of computed well-exposedness maps by using the exposedness measure proposed by Mertens el al. [22] and the proposed classified exposedness measure.

32

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 15 Exposedness maps generated by using the exposedness measure proposed by Mertens et al. [22] (a) E-9 (b) E-8 (c) E-7 (d) E-6 (e) E-5 (f) E-4 (g) E-3 (h) E-2 (i) E-1 (j) E0 (k) E1 (l) E2 (m) E3 (n) E4 (o) E5

33

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 16 Classified well-exposedness maps (a) E-9 (b) E-8 (c) E-7 (d) E-6 (e) E-5 (f) E-4 (g) E-3 (h) E-2 (i) E-1 (j) E0 (k) E1 (l) E2 (m) E3 (n) E4 (o) E5

34

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 17 Exposedness maps generated by using the exposedness measure proposed by Mertens et al. [22] (a) E-14 (b) E-13 (c) E-12 (d) E-11 (e) E-10 (f) E-9 (g) E-8 (h) E-7 (i) E-6

(j) E-5 (k) E-4 (l) E-3 (m) E-2 (n) E-1 (o) E0

35

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig. 18 Classified well-exposedness maps (a) E-14 (b) E-13 (c) E-12 (d) E-11 (e) E-10 (f) E-9 (g) E-8 (h) E-7 (i) E-6 (j) E-5 (k) E-4 (l) E-3 (m) E-2 (n) E-1 (o) E0

36 of weights among these 2M+1 weight maps equals 1:

.

Fig. 19 and Fig. 20 show two example of final weight maps produced by multiplying the proposed contrast measure and the classified exposedness measure.

37

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig 19 Weight maps, Wk,of each exposure image Yk generated by using the proposed classified exposedness measure. (a) W-9 (b) W-8 (c) W-7 (d) W-6 (e) W-5 (f) W-4 (g) W-3

(h) W-2 (i) W-1 (j) W0 (k) W1 (l) W2 (m) W3 (n) W4 (o) W5

38

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

(m) (n) (o)

Fig 20 Weight maps, Wk,of each exposure image Yk generated by using the proposed classified exposedness measure. (a) W-14 (b) W-13 (c) W-12 (d) W-11 (e) W-10 (f) W-9 (g) W-8 (h) W-7 (i) W-6 (j) W-5 (k) W-4 (l) W-3 (m) W-2 (n) W-1 (o) W0

39

2.4.4 Classified Image Fusion in the DWT Domain

Mertens et al. [22] have shown that if images are directly fused in the spatial domain, there will be annoying seams at pixels where weight values change quickly.

To solve this problem, they blend the images in multiple resolutions realized by using image pyramid decomposition. First, a Laplacian pyramid is built for each exposure image and a Gaussian pyramid is constructed for each weight map. Then the coefficients are combined for each level independently. Finally, the combined coefficients are collapsed to obtain the fused image. In this thesis, the fusion method proposed by Malik et al. [26] will be employed to merge the virtual images in discrete wavelet transform (DWT) domain to avoid annoying seams caused by the rapid change of weight values. Discrete wavelet transform is a well-known method to perform multi-resolution decomposition of an image. For one-dimensional (1-D) DWT, the input signal is filtered by a low-pass filter and a high-pass filter. The low-pass filtering reserves the coarse information while the high-pass filtering extracts the detail information of the input signal. Then, the filtering result is down-sampled by a factor of two. To apply 2-D DWT to an image, 1-D DWT can be first applied to each row of the input image. Then, 1-D DWT is again applied to each column of the corresponding two decimated signals. This procedure completes one level of 2-D DWT decomposition and results in four low-resolution subimages,

40

denoted by LL, LH, HL, HH. The subimage LL preserves the coarse information of the input image while the other subimages, LH, HL, and HH respectively correspond to vertical, horizontal, and diagonal details. The subimage LL can be further decomposed to four subimages by applying 2-D DWT to it. Therefore, there will be 3L+1 subimages after applying L-level 2-D DWT on the input image.

In this thesis, we apply L-level 2-D DWT on each virtual exposure image Yk in order to produce 3L+1 wavelet subimages. Let Ykl,denotes the wavelet subimage with direction θ ({LL,LH,HL,HH}) at level l. For each weight map, Wk, a Gaussian pyramid is constructed. Let Wkl denote the subimage of weight map at level l associated with exposure image Yk. Then the blending of these virtual exposure images is implemented by a weighted sum of the wavelet subimages at level l (1≦l≦L) of all virtual images with the coefficients at the same level of the Gaussian pyramid of the weight map serving as the weights:

.

where Fl,θ(x, y) denotes the fused wavelet coefficients of pixel (x, y) with direction θ at level l. The final fused image F(x, y) can be obtained by applying the inverse DWT to the fused wavelet subimages, Fl,θ(x, y).

41

2.5 Color Components Reconstruction

Finally, the fused grayscale image, F(x, y), will be used to reconstruct the color image. Let R, G, and B represent the red, green, blue components of the original image respectively. To prevent from relevant hue shift and color desaturation, the color components will be reconstructed by the following equation [23]:

Finally, the fused grayscale image, F(x, y), will be used to reconstruct the color image. Let R, G, and B represent the red, green, blue components of the original image respectively. To prevent from relevant hue shift and color desaturation, the color components will be reconstructed by the following equation [23]:

相關文件