H OLE - FILLING - OVERVIEW OF VIEW SYNTHESIS REFERENCE SOFTWARE ALGORITHM

CHAPTER 3 OVERVIEW OF VIEW SYNTHESIS REFERENCE SOFTWARE ALGORITHM

3.4 H OLE - FILLING

Remaining holes flagged as “final-hole” after blending can be handled by different methods in view synthesis. FTV uses the advanced inpainting method [15] for the general warping mode, and the simple linear interpolation method for the 1D horizontal shift mode. Müller et

al. [9] extrapolates only background color on holes by examining depth value on the two sides of hole-border, because foreground has larger depth and background has smaller depth. Oh et al [16] proposed a depth-based inpainting method which also fills holes with only background color. No matter what methods are, because these remaining final-holes cannot be seen from any reference views but only can be filled reference to surrounding pixels, it is enough for holes to be filled naturally but not exactly.

However, the inpainting is a frame-based image processing and are more complex in hardware implementation. Thus, we apply a simple bi-linear interpolation, which performs a 2D low-pass filter with geometric distance weighting on the final-hole flag. In this thesis, we implemented this simple bi-linear interpolation by block-based as shown in Fig. 3-5.

Fig. 3-5 Bi-linear interpolation of hole-filling

However, the block size is related to hole-size and is a key factor in internal memory size as well as in performance. When block is too small, the larger holes in frame border may not be filled; if block is too big, the buffer size becomes large, and the interpolated texture would be noised. Table 3-2 shows the performance of some sequences under different block sizes. The sequence “Ballet” has larger holes, so that its performance is better when block size increases.

The sequences “BookArrival,” “LoveBird1” and “Kendo” have smaller holes, so that when

Fig. 3-6 Performance of bi-linear interpolation for hole-filling in different block size

Table 3-2 Performance of hole-filling by using bi-linear interpolation in different block size Y-PSNR Performance (dB)

Block size Ballet Breakdancers BookArrival Lovebird1 Newspaper Kendo 5x3 33.18638 33.06250 36.41172 31.80200 30.67576 33.00001 9x5 33.20828 33.16606 36.37078 31.80157 30.67691 32.99998 13x7 33.21609 33.17187 36.35280 31.80039 30.67858 32.99997 17x9 33.21837 33.16193 36.34878 31.79952 30.67945 32.99996 21x11 33.22026 33.14299 36.34814 31.79897 30.67974 32.99996

PSNR

BHxBW

PSNR

BHXBW

PSNR

BHXBW

PSNR

BHxBW

PSNR

BHxBW

PSNR

BHxBW

Chapter 4

Proposed architecture

Our objective is to implement a real-time view synthesis (VS) engine corresponding to the VSRS algorithm for the frame size of HD1080p (1920x1080). There are three main challenges in implementing the VSRS algorithm. First, for general application, the 3D warping requires much more hardware complexity, especially in storage cost, than the horizontal shift method. That results from the cameras with rotation, so that the disparities between each view are not only in horizontal direction as shown in Fig. 2-2. Hence data storage is increased from 1D to 2D, and its data control becomes complicated.

Second is that the VSRS algorithm uses two steps of 3D warping, one for depth mapping and the other for texture mapping. The main advantage in two steps warping is that warped depth map can be post-filled for better texture mapping. In addition, because the reverse warping processes in the index of target view, two synthesized views from different reference views can be processed at blending and hole-filling steps in parallel. However, the data storage and access are increased for the additional synthesized depth map, and therefore internal memory and bandwidth utilization become critical in architecture design.

Third challenge is in the hole-filling. As Chapter 3.4 described, we choose the simple bi-linear interpolation with the block size of 9x5 in this step. But the data storage and access is still a challenge because of the remaining holes at irregular and discontinuous positions.

Our architecture design is focus on solving the above three challenges. Finally the architecture adopts the two frame-level pipeline stages, and each with hierarchical column-level pipeline

4.1 Two frame-level pipelining stages

Because the depth mapping using forward warping and the texture mapping using reverse warping are performed at different positions, the former depth mapping should stores the warped depth of virtual view in a reorder buffer for the latter texture mapping. This size of reorder buffer will be disparity level if videos are rectified with no rotation. On the other hand, its size is up to multiple rows if videos are with rotation. For example of “Ballet” in Fig. 4-1, the region of depth map from row 0 to row 30 in reference view are forward warped to the target view with out-of-order position. The previous 20 rows in reference view are warped out to frame range, and this means that the first whole row of the virtual view is collected after the warping process of 20 rows. We need a buffer size of frame width by 20, which is 40.96KB to buffer the previous mapped depth, and is up to 108KB for HD1080P.

(a)

(b)

Fig. 4-1 Warped depth map row0 to row30 of “Ballet” (a) is the reference view and (b) is the virtual view.

To eliminate this reorder buffer, we propose the architecture of two frame-level pipeline stages, which performs the depth mapping process and the texture mapping process in different stages.. Fig. 4-2 shows the schedule of the proposed two frame-level architecture.

The warped depth is stored in the external memory at 1^st stage and read at 2^nd stage for texture mapping.

With the proposed two-level architecture, Table 4-1 shows that the total bandwidth is increased for the additional access of warped depth map. By using 64-bits bus with the working frequency of 200MHz, the bus utilization is 39.375% for the video throughput of 30 frames per second (fps). In addition, for the specific analysis, we use the bus width of 64-bits in our design.

Note that in Fig. 4-2, the warped depth maps are written and read simultaneously by 1^st stage and 2^nd stage. This means that there are ping-pong-like external memories for the warped depth maps. One is written the warped depth value of frame i, and the other is read the warped depth value of frame i-1.

Fig. 4-2 Two frame-level pipeline and the access between external memory

Table 4-1 Total bandwidth of two frame-level stages Architecture

Data One frame-level stage Two frame-level stage

Depth map (Left, Right) 2Frame(Read) 2Frame(Read)

Depth map (Left to virtual

Right to virtual) - 2Fram(Read, Write)

Texture (YUV, L, R) 3Frame(Read) 3Frame(Read)

Texture (YUV, virtual) 1.5Frame(Write) 1.5Frame(Write) Total bandwidth

( 2MB/frame) 13MB 21MB

4.2 Scan-column warping order

Usually a Z-buffer/depth buffer of frame-size is needed in depth mapping [8]. In 3D world coordinate, if foreground objects and background objects are projected to the same position in the image plane, the foreground objects will occlude the background objects. The Z-buffer should store all warped depth value in the depth mapping process for depth comparison to handle the occlusion problem.

For on-the-fly warping processing, the size of a shift window is proportional to the horizontal search range (SR_H) and the vertical search range (SR_V). Furthermore, the search range is different among scenes and is increased when frame-size is larger. For the example, the frame size of “Ballet” , is 1024x768, the SR_H is 55 and SR_V is 197 for camera 5 and camera 4.

Hence the total Z-buffer is at least 21.67KB and is up to 57.13KB for HD1080P.

To eliminate the Z-buffer usage, when cameras are configured in a straight line, the foreground will occlude background in the same scan-line correctly if we scan from left to right for right-view warping and scan from right to left for left-view warping. This warping method is called depth-compatible order method, whose necessary constraint is the epipolar lines are parallel to scan-line. This is the scan-line order under the cameras with precisely parallel configuration, and the Z-buffer can be omitted in this case.

When cameras are with rotation, Morvan [4] has derived the occlusion-compatible scanning order for non-rectified images according to the epipolar geometry as shown in Fig. 4-3. For C and C’ are camera locations for virtual view and reference view; Pb and Pf are both projected to p in the virtual view and the epipole e’ is the point of C projected to the reference plane.

The scanning order in the reference view should be from the frame border to epipole e’ in the epipolar line so that foreground Pf can occlude background Pb correctly in the vitual view.

Fig. 4-3 Occlusion-compatible scanning order revised from [4]

However, the calculation of epipolar line consumes additional hardware computational cost, followings we analyze different scan-order approaches to eliminate the computational cost. If the epipolar lines lie in the reference right view as in Fig. 4-4(a), the original scan-line order will fail since an epipolar line will be warped in different scan-line and the correct scan order in the epipolar line will be ruined as shown in Fig. 4-4(b). The similar situation occurred in Fig. 4-4(d), (e) for the reference left view. An example of scan-line order error is Fig. 4-5(a).

We find that the warping order can be transferred to scan-column order that will not induce occlusion error even if without the accurate epipolar lines because the order is adapted from

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.