Depth Map Fusion from Edge Exploring Thresholding (DFEET)

Chapter 3 Structure and Algorithms

3.2 Algorithm

3.2.2 Depth Map Fusion from Edge Exploring Thresholding (DFEET)

In DFEET, four steps are executed: 1.) deviation correction 2.) edge searching 3.) thresholding 4.) Combination.

First of all, when we use spatial HDDR system, two depth maps are not rendered from the identical height. They are similar to elemental images that include disparity. Therefore, we have to adjust one depth map to the same height with the other depth map. For example, if we use the depth map from the nether position as the model, we have shift the upper depth map downward to keep the positions of objects are the same. Figure 3-10 illustrates an example of deviation correction. Red image and green image are the original image captured from upper position and shifted image respectively. It should be noted that deviation correction is carried out after the conventional depth map is generated. Hence, both of the color image and depth map should undergo shifting.

Figure 3-10 Deviation correction

It goes without saying that if temporal HDDR system is applied, there’s no need to do the deviation correction, i.e. disparity is zero.

Secondly, edge is often regarded as the indication to judge the things whether they are in focus. Consequently, we apply edge filter to analogize the region of depth of field. When it comes to edge filter, high pass filter is the basis of edge detection. However, high pass filter such as Laplacian operator is sensitive to noise, so it requires noise suppression beforehand.

Fortunately, Marr and Hildreth proposed Laplacian of Gaussian (LoG) method to take care of these two considerations. We can create a mask by sampling the following equation. [53]

(38)

where G(x,y) is a Gaussian function. As the name implies, we perform Gaussian smoothing as well as Laplacian sharpening. LoG is one of the common approaches used in edge detection and it is easy to implement with acceptable accuracy, so we use it in our algorithm. After extracting the edges of the elemental images, we apply another identity matrix to convolve with edge image in order to find a representative focal point.

Figure 3-11 Representative point of a deviation-corrected elemental image

In Figure 3-11, in addition to the line above due to the deviation correction, the green point implies that this position is full of edge information, so we regard it as the representative focal point. Conceptually, we consider this point as the location of focal plane when we capture the

image but actually it is not always correct. When the content is texture-less or with low contrast, the representative focal point would locate far away from what we expect. This phenomenon will influence the exactness of determining the threshold value in next step.

However, there’s an ambiguity between two depth of field as mentioned before, so the variation of the threshold value is acceptable. Unless the representative focal points deviate too much, segmentation error of depth map will occur since the boundary is distinct from the region of ambiguity. In other words, ill-defined objects will not be filtered out after thresholding.

With regard to thresholding, once the representative focal point is discovered, we can find the corresponding gray level from the depth map. By the same token, the representative focal point and its corresponding gray level can be detected as well. As a result, if N depth of field are arranged in capturing, N representative focal points will be extracted. Subsequently, N-1 threshold gray value can be decided by averaging the corresponding gray levels of two adjacent representative focal points as illustrated in Figure 3-12. The meaning of thresholding in depth maps is equivalent to separation of each depth of field along the scene.

But in the depth map, most of the objects become planar because each of them includes only few gray levels. Therefore, we don’t have to worry about the exact position while thresholding.

Figure 3-12 Scheme of threshold determination

However, due the noise or other factors that causes matching error while generating conventional depth maps, the gray level varies inconsistently among the depth maps.

Accordingly, each segment cannot be carefully combined. As shown in Figure 3-13, red, green, blue parts stand for three segments and when they are composed together, voids (black) and overlapping (yellow and cyan) will appear between the boundaries of each segment.

Figure 3-13 Segmentation voids

Hence in the last step, combination, we have to deal with these two problems: voids and overlapping. As for overlapping, the relative large gray level is selected because it is on behalf of the front object point. This concept is accordant with the experience in real world. However, the issue of voids is more difficult to handle because it is a process that makes something out of nothing. Therefore, we have to rely on the side information of the voids. Lest smoothing operators will degrade the contrast between the object and its background, we use median filter to reconstruct the voids.

Figure 3-14 Three situations while reconstruction (a) ideal image (b) voided image

There are three situations during reconstruction as the three blocks in Figure 3-14 . Firstly, red block means the pixel of the void is luckily surrounded by its correct object points, so same gray level will be assigned by median filter, as depicted in Figure 3-15 (a). After sorting, the central pixel G0 will be substituted by G1. Likewise, blue block shows the void is encompassed by the similar background point we want to reconstruct. However, the background of depth maps is gradually varying, so reconstruction will lead to small mistakes by median filter, as shown in Figure 3-15 (b). Although the background points G2~G4 are in the majority, G0 will mistakenly be replaced with G2, if the normal direction of camera is parallel to the extension of depth. In other words, G0 should be replaced with G3.

Nevertheless, the result will be different with only few gray levels if the background is with no noise, i.e. subtracting G2 from G3 is almost zero. But for the third situation, dramatic errors are caused especially for regions of small aspect ratio. As the example in Figure 3-15 (c), central G0 should be changed with background points, but it is assigned with G1 instead.

This error leads to a region growing which makes the shape of the object distorted.

Figure 3-15 Examples of median filtering of three situations of reconstruction (a) object reconstruction (b) background reconstruction (c) error reconstruction

Chapter 4 Experiments and Results

Based on the system and algorithm illustrated in chapter 3, the experimental results will be shown here and it includes four parts. First, we reveal how the image quality will influence the depth map rendering. Subsequently, HDDR depth maps are rendered by stacking two and three depth of field respectively, and compared with the result captured by the largest f-number of our camera (F/22). Moreover, the feasibility of lens array will be verified in the last part via considering the actual pitch of each lenslet. Finally, the characteristics of temporal and spatial HDDR system will be summarized.

4.1 Depth Maps of Under-exposed and Blurred Images

As mentioned in Chapter 3, we don’t use large f-number to increase the range of depth of field owing to the low light efficiency. If we want to maintain the captured intensity, dimming environment or objects with quick motion will lead to a dilemma of exposure time. Therefore, we first examine how under-exposed or blurred images would affect the depth maps. Figure 4-1 shows the colorful elemental images captured with large f-number (F/22). Obviously, although the exposure time is only one tenth shorter, the contrast of image decreases dramatically. Compared to our spatial HDDR system, the exposure time of (d)~(f) is even several dozen times longer. Note that (g)~(i) is the adjusted images simply for the fear that image information in (d)~(f) is not clear enough. Due to the insufficient image information of under-exposed image, it is difficult to render a good depth map. As shown in Figure 4-2, all the objects are ill-defined in the depth map. Because the ambient light is not uniform, the result of rear objects is worse than that of front object.

Although blurred effect of out-of-focus images is not exactly equivalent with that of ghost images due to motions, both of them result in soft edges however. If we regard edge as

an important feature, the two cases are similar to some degree. Consequently, we firstly utilize circular disk to generate the defocused images [54]. By rendering the depth map from the defocused image as shown in Figure 4-3, the error trend can be simulated. Larger object will lead to more inaccurate pixels, so the error is calculated from the ratio of the wrong pixels and the size of object and the result is exhibited in Figure 4-4. In experiment, we also use blurred elemental images to testify whether there will be lots of mismatch in the rendered depth map. Three blurred elemental images are shown in Figure 4-5. Although we maintain most of the color information so that human can still roughly define the shape and locations of objects, the fuzziness of objects is far beyond that in color images.

Figure 4-1 Elemental images under F/22 with different exposure time (a)(b)(c) three perspectives with adequate EV, (d)(e)(f) one tenth of adequate EV, (g)(h)(i) adjusted images of (d)(e)(f) respectively

Figure 4-2 Rendered depth map from under-exposed elemental images

Figure 4-3 Blurred image and its depth map (a)(d) ground truth (b)(e) 21-pixel variance (c)(f) 41-pixel variance

Figure 4-4 Error rate versus the variance of disk

Figure 4-5 Three out-of-focus elemental images (a) left perspective (b) central perspective (c) right perspective

Figure 4-6 Rendered depth map from blurred elemental images

Even for soft matching in DERS, the tolerance is still confined. As a consequence, once the spot of one specific object point is getting bigger, the details that can be treated as the feature will be blended together and hard to be distinguished.

According to the result of simulation and experiment, the importance of a clear and crisp image has been corroborated. Moreover, increasing the f-number to extend the depth of field is not always workable, so the following sections will reveal the results of our HDDR system which elongates the range of depth without increasing the f-number.

4.2 HDDR Depth Map Rendering of Two Depth of Field

In this section, we apply temporal HDDR system to confirm our idea that we can extend the depth of field even using small f-number while capturing. To make the verification simple, we did not use the pitch same as the size of lenslet. Instead, pitch of 1cm is chosen in order to avoid that the disparity might exceed the limitation of DERS.

At first, six elemental images with 2 objects locating at 87cm and 150cm are captured under F/2.8 as shown in Figure 4-7. Two objects are focused individually in two set of elemental images. Therefore, two rendered depth maps in this experiment will include only one well-defined object. Moreover, the stripe floor is utilized to supply plentiful matching candidates. And in the end of chapter 4, the influence of more complicated floor will be testified. As Figure 4-8 and Figure 4-9 shows, the distinct results of two depth maps meet with our postulation that well-outlined contours of objects can be rendered unless they are well focused. Moreover, this effect also corresponds to the concept of high dynamic range (HDR) images. HDR image is adjusting the range of luminance while HDDR system is arranging the range of depth of field. Subsequently, we stack all the parts together.

Figure 4-7 Six elemental images of thee perspectives and two focal positions and captured under F/2.8 (a)(c)(d) focus at foreground (b)(e)(f) focus at background

Figure 4-8 Two rendered depth maps (a) focus at foreground (b) focus at background To fuse these two maps, our target is simply preserving the will-defined objects. Figure 4-10 illustrates process of fusion, in which red and green colors stand for two different depth maps after thresholding. Because we use temporal method, there’s no need to do the deviation correction. So we start with edge searching, the corresponding gray levels of two representative focal points are obtained by means of finding the corresponding positions of depth maps, as shown in Figure 4-10 (a)(b). Subsequently, we average the two gray values to determine the threshold and filter the ill-outlined objects, as shown in Figure 4-10 (c).

Figure 4-9 Details of objects in rendered depth maps (a)(c) in focus (b)(d) out of focus Finally the HDDR depth map can be reconstructed via median filtering, and the result is demonstrated in Figure 4-11 (a). Perceivably, two objects with crisp edge are extracted in HDDR depth map and it is comparable with rendered depth map from elemental images captured by large f-number as shown in Figure 4-11 (b). However, there are some imperfections on the surface of the objects, i.e. darker spots, and these errors might be caused from the noise. Notwithstanding HDDR depth map is generated in a relatively complex method compared to increasing the f-number in this experiment, the capturing time is dramatically reduced by 32 times even for temporal HDDR system. If we use spatial HDDR system, the capturing time can shorter by 64 times than that of larger f-number. In the following section, we use three depth of field to break through the working range of largest f-number (f/22) of our camera in experiment.

Figure 4-10 Experiment images during fusion process (a)(b) finding representative focal point (c) two depth maps fusion after thresholding

Figure 4-11 Depth maps rendered of (a) HDDR system (b) large f-number (f/22)

4.3 HDDR Depth Map Rendering of Three Depth of Field

In previous section, even though we use 8 times smaller f-number to capturing the elemental images, the rendering of depth map is still not restricted by the shallow depth of field and according to the experimental result, our HDDR depth map is almost identical to the rendered depth with larger f-number. So in this section, we use one additional depth of field to exceed range of large f-number (F/22).

Figure 4-12 Elemental images of three focal positions and captured under F/2.8 (a)(b)(c) focus at first object (d)(e)(f) focus at middle object (g)(h)(i) focus at the last object from left, central and right perspective respectively

In this section, we place an additional object, little butterfly, at 35 cm to examine the feasibility in the nearer region and the positions of the other two objects are similar in the

previous experiment. They are set at 76 and 152 cm respectively. First of all, same f-number, f/2.8, is applied to capture nine elemental images containing three focal positions as shown in Figure 4-12. Subsequently, every three of them are inputted into DERS to render a depth map. Figure 4-13 illustrates the idea again that the object will be well-defined in the depth map as long as it is focused. According the result in Figure 4-14, we could further verify that blur is one of the factors that govern the accuracy of depth map rendering. From the tendency of the degradation of contours, it is clear that when the object is distant from the depth of field, say the last object while focusing at the first object, the result becomes worse because it blurs more. Likewise, the first object is ill-defined especially when we focus at the last object. Due to the fact that depth of field is a function of object distance, so it shrinks when the objects are placed closer to the camera. This phenomenon particular benefits our HDDR system in near field because f-number has its upper limit for conventional cameras.

Figure 4-13 Three rendered depth maps (a) focus at first object (b) focus at middle object (c ) focus at the last object

By the same steps of fusion elaborated in chapter 3, because we have three depth of field, three representative focal points should be detected as shown in Figure 4-15. Once the corresponding gray levels are found, threshold value can be calculated by averaging two of them. Owing the noise, the thresholding will bring about imperfection combination as illustrated in Figure 4-16. There are many voids lying along the boundaries. Besides, some isolated regions remain after thresholding such as the blue and cyan spots in the magnified image. Unfortunately these redundant spots cannot be eliminated in the following process

because they should have been cut out while thresholding. Hence, when we reconstruct the HDDR depth map, they will leave the darker spots around the first object.

Figure 4-14 Details of objects in rendered depth maps (a)(d)(g) focus at the first object (b)(e)(h) focus at the middle object (c)(f)(i) focus at the last object

Figure 4-15 Experiment images of finding three representative focal points

As shown in Figure 4-17 (a), the surrounding points of the first object in HDDR depth map are worse than that in its origin depth map owing to two reasons. One is described in the previous paragraph. The other is that the lighter regions are the noise of the second depth map.

Because the evaluation of edge is judged by the contrast, the object will look ill-outlined if its background is in a mess.

Figure 4-16 Experiment results of fusion and its details

Figure 4-17 Depth maps rendered of (a) HDDR system (b) large f-number (f/22)

Figure 4-18 Comparison of the first object in color image and depth map of (a)(c) HDDR system (b)(d) large f-number (f/22)

Therefore, when we compare the HDDR depth map with the rendered depth map shown in Figure 4-17 (b), the discrepancy of two depth maps is lessened. However, even for the largest f-number of our camera, the scene still cannot be captured all in focus. As shown in Figure 4-18 (b), the veined wings is actually blurred, so the first object in the rendered depth map is slightly fuzzy. And this result can prove that our HDDR system not only surpasses the dynamic range of the capturing with large f-number, but also can be extended to the case of stacking more depth of field so as to render even higher dynamic depth range.

To quantify the working range of different focal design, we utilize Figure 4-4 to judge the range with acceptable degree of blur. We set focal plane at 150 cm and measure the variance of a black-and-white edge. Figure 4-19 shows the concept of point spread function that an ideal point image diverges as it is distant from the focal plane and the smaller f-number, the faster it diverges. Because we hope the error rate of rendered depth map can be less than 10%, the upper bound of variance is around 10 pixels. Accordingly, the working range can be decided as shown in Figure 4-20.

Figure 4-19 Variance of different F/# versus depth

The working range of HDDR system is counted from the first object to the terminal wall (200 cm). Apparently, the working range increase with larger f-number, but working range of

HDDR using small f-number (F/2.8) is even wider than that of the largest f-number (F/22) of our camera. Furthermore, the exposure time is also minimized as illustrated in Figure 4-21.

Around 21 times shorter exposure time will benefit the capturing of the instantaneous moments. If spatial HDDR system is implemented, the exposure time will be reduced more and kept the same while stacking more depth of field.

Figure 4-20 Working range of different focal designs

Figure 4-21 Exposure time of different focal designs

To conclude, as long as the render depth map is less vulnerable to noise, the performance of HDDR depth map will be better. However, compared to largest f-number of our camera (F/22),

在文檔中利用變焦影像產生高動態深度範圍之深度圖 (頁 67-0)