• 沒有找到結果。

H OLE - FILLING

在文檔中 視點合成器分析與設計 (頁 28-0)

CHAPTER 3 OVERVIEW OF VIEW SYNTHESIS REFERENCE SOFTWARE ALGORITHM

3.4 H OLE - FILLING

Remaining holes flagged as “final-hole” after blending can be handled by different methods in view synthesis. FTV uses the advanced inpainting method [15] for the general warping mode, and the simple linear interpolation method for the 1D horizontal shift mode. Müller et

al. [9] extrapolates only background color on holes by examining depth value on the two sides of hole-border, because foreground has larger depth and background has smaller depth. Oh et al [16] proposed a depth-based inpainting method which also fills holes with only background color. No matter what methods are, because these remaining final-holes cannot be seen from any reference views but only can be filled reference to surrounding pixels, it is enough for holes to be filled naturally but not exactly.

However, the inpainting is a frame-based image processing and are more complex in hardware implementation. Thus, we apply a simple bi-linear interpolation, which performs a 2D low-pass filter with geometric distance weighting on the final-hole flag. In this thesis, we implemented this simple bi-linear interpolation by block-based as shown in Fig. 3-5.

Fig. 3-5 Bi-linear interpolation of hole-filling

However, the block size is related to hole-size and is a key factor in internal memory size as well as in performance. When block is too small, the larger holes in frame border may not be filled; if block is too big, the buffer size becomes large, and the interpolated texture would be noised. Table 3-2 shows the performance of some sequences under different block sizes. The sequence “Ballet” has larger holes, so that its performance is better when block size increases.

The sequences “BookArrival,” “LoveBird1” and “Kendo” have smaller holes, so that when

64

Fig. 3-6 Performance of bi-linear interpolation for hole-filling in different block size

Table 3-2 Performance of hole-filling by using bi-linear interpolation in different block size Y-PSNR Performance (dB)

Block size Ballet Breakdancers BookArrival Lovebird1 Newspaper Kendo 5x3 33.18638 33.06250 36.41172 31.80200 30.67576 33.00001 9x5 33.20828 33.16606 36.37078 31.80157 30.67691 32.99998 13x7 33.21609 33.17187 36.35280 31.80039 30.67858 32.99997 17x9 33.21837 33.16193 36.34878 31.79952 30.67945 32.99996 21x11 33.22026 33.14299 36.34814 31.79897 30.67974 32.99996

PSNR

BHxBW

PSNR

BHXBW

PSNR

BHXBW

PSNR

BHxBW

PSNR

BHxBW

PSNR

BHxBW

Chapter 4

Proposed architecture

Our objective is to implement a real-time view synthesis (VS) engine corresponding to the VSRS algorithm for the frame size of HD1080p (1920x1080). There are three main challenges in implementing the VSRS algorithm. First, for general application, the 3D warping requires much more hardware complexity, especially in storage cost, than the horizontal shift method. That results from the cameras with rotation, so that the disparities between each view are not only in horizontal direction as shown in Fig. 2-2. Hence data storage is increased from 1D to 2D, and its data control becomes complicated.

Second is that the VSRS algorithm uses two steps of 3D warping, one for depth mapping and the other for texture mapping. The main advantage in two steps warping is that warped depth map can be post-filled for better texture mapping. In addition, because the reverse warping processes in the index of target view, two synthesized views from different reference views can be processed at blending and hole-filling steps in parallel. However, the data storage and access are increased for the additional synthesized depth map, and therefore internal memory and bandwidth utilization become critical in architecture design.

Third challenge is in the hole-filling. As Chapter 3.4 described, we choose the simple bi-linear interpolation with the block size of 9x5 in this step. But the data storage and access is still a challenge because of the remaining holes at irregular and discontinuous positions.

Our architecture design is focus on solving the above three challenges. Finally the architecture adopts the two frame-level pipeline stages, and each with hierarchical column-level pipeline

4.1 Two frame-level pipelining stages

Because the depth mapping using forward warping and the texture mapping using reverse warping are performed at different positions, the former depth mapping should stores the warped depth of virtual view in a reorder buffer for the latter texture mapping. This size of reorder buffer will be disparity level if videos are rectified with no rotation. On the other hand, its size is up to multiple rows if videos are with rotation. For example of “Ballet” in Fig. 4-1, the region of depth map from row 0 to row 30 in reference view are forward warped to the target view with out-of-order position. The previous 20 rows in reference view are warped out to frame range, and this means that the first whole row of the virtual view is collected after the warping process of 20 rows. We need a buffer size of frame width by 20, which is 40.96KB to buffer the previous mapped depth, and is up to 108KB for HD1080P.

(a)

(b)

Fig. 4-1 Warped depth map row0 to row30 of “Ballet” (a) is the reference view and (b) is the virtual view.

To eliminate this reorder buffer, we propose the architecture of two frame-level pipeline stages, which performs the depth mapping process and the texture mapping process in different stages.. Fig. 4-2 shows the schedule of the proposed two frame-level architecture.

The warped depth is stored in the external memory at 1st stage and read at 2nd stage for texture mapping.

 

With the proposed two-level architecture, Table 4-1 shows that the total bandwidth is increased for the additional access of warped depth map. By using 64-bits bus with the working frequency of 200MHz, the bus utilization is 39.375% for the video throughput of 30 frames per second (fps). In addition, for the specific analysis, we use the bus width of 64-bits in our design.

Note that in Fig. 4-2, the warped depth maps are written and read simultaneously by 1st stage and 2nd stage. This means that there are ping-pong-like external memories for the warped depth maps. One is written the warped depth value of frame i, and the other is read the warped depth value of frame i-1.

Fig. 4-2 Two frame-level pipeline and the access between external memory

Table 4-1 Total bandwidth of two frame-level stages Architecture

Data One frame-level stage Two frame-level stage

Depth map (Left, Right) 2Frame(Read) 2Frame(Read)

Depth map (Left to virtual

Right to virtual) - 2Fram(Read, Write)

Texture (YUV, L, R) 3Frame(Read) 3Frame(Read)

Texture (YUV, virtual) 1.5Frame(Write) 1.5Frame(Write) Total bandwidth

( 2MB/frame) 13MB 21MB

4.2 Scan-column warping order

Usually a Z-buffer/depth buffer of frame-size is needed in depth mapping [8]. In 3D world coordinate, if foreground objects and background objects are projected to the same position in the image plane, the foreground objects will occlude the background objects. The Z-buffer should store all warped depth value in the depth mapping process for depth comparison to handle the occlusion problem.

For on-the-fly warping processing, the size of a shift window is proportional to the horizontal search range (SR_H) and the vertical search range (SR_V). Furthermore, the search range is different among scenes and is increased when frame-size is larger. For the example, the frame size of “Ballet” , is 1024x768, the SR_H is 55 and SR_V is 197 for camera 5 and camera 4.

Hence the total Z-buffer is at least 21.67KB and is up to 57.13KB for HD1080P.

To eliminate the Z-buffer usage, when cameras are configured in a straight line, the foreground will occlude background in the same scan-line correctly if we scan from left to right for right-view warping and scan from right to left for left-view warping. This warping method is called depth-compatible order method, whose necessary constraint is the epipolar lines are parallel to scan-line. This is the scan-line order under the cameras with precisely parallel configuration, and the Z-buffer can be omitted in this case.

When cameras are with rotation, Morvan [4] has derived the occlusion-compatible scanning order for non-rectified images according to the epipolar geometry as shown in Fig. 4-3. For C and C’ are camera locations for virtual view and reference view; Pb and Pf are both projected to p in the virtual view and the epipole e’ is the point of C projected to the reference plane.

The scanning order in the reference view should be from the frame border to epipole e’ in the epipolar line so that foreground Pf can occlude background Pb correctly in the vitual view.

Fig. 4-3 Occlusion-compatible scanning order revised from [4]

However, the calculation of epipolar line consumes additional hardware computational cost, followings we analyze different scan-order approaches to eliminate the computational cost. If the epipolar lines lie in the reference right view as in Fig. 4-4(a), the original scan-line order will fail since an epipolar line will be warped in different scan-line and the correct scan order in the epipolar line will be ruined as shown in Fig. 4-4(b). The similar situation occurred in Fig. 4-4(d), (e) for the reference left view. An example of scan-line order error is Fig. 4-5(a).

We find that the warping order can be transferred to scan-column order that will not induce occlusion error even if without the accurate epipolar lines because the order is adapted from

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

3 Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

7 Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

9 8 7

(a) (b) (c)

(d) (e) (f) Fig. 4-4 Warping order analysis.

(a) (b) Fig. 4-5 Warped depth maps without Z-buffer with (a) scan-line order and (b) scan-column order

However, the location of epipole determines the scan order in epipolar line and hence for cases that epipole lies inside the visible frame as in Fig. 4-6, our scan-column order must be modified according to the epipole position.

Fig. 4-6 Epipole lies inside frame

4.3 Analysis of bus efficiency and bandwidth in warping process

Fig. 4-7 Forward warping example of a column in the reference view

For the proposed frame-level pipelining architecture, the out-of-order warping also increases the request-times between the core and bus due to writing warped depth in the 1st frame-level stage and reading warped texture in the 2nd frame-level stage. Because the scan-column order is adopted in this design, both the depth map and texture are arranged into a column to be stored in a row of the external memory. With this data arrangement, Fig. 4-7 shows a column of reference view is forward warped to the synthesized view. The warped positions are continuous in the synthesized view if their depth values are the same. But pixels with different depths are mapped to different columns, so that they are stored in different rows of the

bus transfer is not efficiency since only partial data in a transfer are available. It further makes the required bus cycle be increased. Although the total bandwidth is enough under the setting of 64-bit bus, the inefficient transferring results in increasing bus utilization and degrades the overall performance.

The transition efficiency is related to depth continuity, which depends on sequences.

Moreover, if we attempt to promote the efficiency of data transmission, the input and output (I/O) buffers would be increased to collect more data for reordering. Fig. 4-8 and Table 4-3 shows the analysis of the bus efficiency with different I/O buffer sizes for the sequences

“Breakdancers,” “Ballet,” “BookArrival,” and “Lovebird1”. The detail data of “Breakdancers”

is shown in Table 4-2. Note that we set the bus width as 64-bits, and these sequences are run for a frame. In Table 4-2, for the bus transmission mode, the single mode means that data are transmitted for less than 8 bytes; while the burst mode means data are transmitted for more than 8 bytes. The request times should be accumulated one if the depth discontinuity happened. The transmit times should be accumulated one for the single mode, and the burst length for the burst mode. In addition, the maximum length is the maximum continuity in depth. The average bus efficiency is calculated as dividing the frame size by the average transmission times multiplying bus width in bytes..

Tabel 4-2 shows that when buffer size is increased, the average bus efficiency is increased.

But the efficiency cannot reach to the maximum value, 100%, except for the whole frame is with the same depth value which does not occur. Therefore, with the scan-column order warping, we choose the I/O buffer as the frame height for higher bus efficiency. Table 4-3 shows that the average efficiency reaches to 88.7%. This means the data are ready when a column access is complete. This concept could be further extended to the column-level pipelining architecture, and is described in Chapter 4.5.

Table 4-2 “Breakcancer,” Cam5 to Came4, analysis of bus efficiency for different I/O buffer size

“Breakdancers”Camera#5 to Camera#4 Buffer Size

(Byte)

Single mode (Byte)

Burst mode

(Byte) Request times Transmit times

Max length (Byte)

Average bus efficiency

8 151712 0 151712 151712 8 0.648

16 94899 27664 108731 122563 16 0.802

32 72799 48140 87488 120939 32 0.813

64 61048 57206 76440 118254 64 0.831

128 55581 60750 71009 116331 128 0.845

256 52790 62245 68269 115035 137 0.855

768 51097 63395 66412 114492 187 0.859

Table 4-3 Bus efficiency in buffer size of 8byte to frame width for different sequences Buffer 8 0.648 0.662 0.623 0.628 0.774 0.848 0.800 0.783 0.721 16 0.802 0.807 0.774 0.775 0.906 0.972 0.916 0.896 0.856 32 0.813 0.810 0.779 0.791 0.917 0.955 0.901 0.887 0.857 64 0.831 0.826 0.793 0.812 0.925 0.957 0.904 0.887 0.867 128 0.845 0.838 0.803 0.825 0.935 0.961 0.909 0.892 0.876 256 0.855 0.846 0.807 0.834 0.941 0.967 0.915 0.899 0.883 768 0.859 0.850 0.809 0.838 0.946 0.973 0.919 0.904 0.887

4.4 Analysis of bandwidth and memory size in hole-filling process

As discussion in Chapter 3.4, we select bi-linear interpolation with a window of 9x5 to do hole-filling. The position of final-hole is determined according to the depth map processed by the median filtering and the two synthesized hole-maps processed by the dilation as shown in the blend mode of Table 3-1. However, the locations of holes are irregular and different in among frames and scenes. For this random position characteristic, the run-time synthesized output has to be stored in the internal memory for hole-filling. To minimize the internal memory usage, the data of whole frame can be stored in the external memory. But to lessen the overall bandwidth, the data should be stored in the internal buffer with the size of several columns. Thus, that is a trade-off issue between the external memory bandwidth and the internal memory usage, and the two approaches are proposed in the following sub-chapters.

4.4.1 Frame-level buffering with vertically dynamic reuse

To have a smaller volume in internal memory, the bi-linear interpolation in hole-filling can be processed at another frame-level stage. But the bandwidth utilization will be raised if doing interpolation by fetching every pixel in block size. For example, a 9x5 bi-linear interpolation needs the additional bandwidth of 45x2MB for a HD1080p frame. If only fetch pixels are flagged as final hole, a hole table recording hole position is needed. As shown in Table 4-4, the holes counts are up to about 1% of a frame. With this percentage, the bandwidth will be up to 0.9MB and the hole-buffer needs 55KB memory for an HD1080p frame.

The bandwidth can be saved by using the data reuse technique. For the random-positioned characteristic of hole, in the blending step, the hole-position (HP) and the hole-height (HH) are pre-calculated and stored into a hole-index table. In the hole-filling step, according to the stored HP and HH, texture data are fetched into the filter kernel, and then the center pixel can

be filtered as shown in Fig. 4-9. This process is performed row by row until the holes are completely filled. For the holes are belong to the same index, their data for filtering can be reused, so that the bandwidth could be saved by 53.903%% in average as shown in Table 4-4.

(a) (b) (c) (d) (e)

Fig. 4-9 Filling process with dynamic vertically reuse for a 9x9 bi-linear filter.

However, the bandwidth is low but the request-time is high. Because the kernel is fetched row by row but the texture data in external memory is stored by column, filling a hole needs request the bus for the block-size times.

4.4.2 Column-level buffering

The other method is run-time storing data in the internal memory. In the scan-column-order warping process, the blended texture is stored in the column buffers. Hence we take this strategy as column-level buffering. After all buffers are full, the interpolation kernel is full and the interpolation starts immediately. It needs no additional bandwidth. However, there is the number of column buffers equal to block-width, and both the texture and flagged final-holes

The other method is run-time storing data in the internal memory. In the scan-column-order warping process, the blended texture is stored in the column buffers. Hence we take this strategy as column-level buffering. After all buffers are full, the interpolation kernel is full and the interpolation starts immediately. It needs no additional bandwidth. However, there is the number of column buffers equal to block-width, and both the texture and flagged final-holes

在文檔中 視點合成器分析與設計 (頁 28-0)

相關文件