• 沒有找到結果。

Design Challenges in High-Quality Algorithm

4.3 Hardware-Efficient Disparity Estimation Algorithm

4.3.1 Design Challenges in High-Quality Algorithm

In the HQ-DE algorithm, the main design challenge consists of the high memory cost and the high computational complexity. They are explained as follows.

1. High Memory Cost in Belief Propagation

The problem of high memory is the fatal disadvantage of BP-based algorithm. The requirement in BP-based algorithm includes the cost cube and the messages. Our low memory-cost approach in Section 3.2.2 could significantly reduce the memory cost, but the memory cost is still proportional to the disparity range DR, even if the block-based [36] or tile-based [29] method is adopted. For example, if the block size is 32×32, DR is 128, and each data is 1-byte, the memory requirement would be 131Kbytes for the cost cube and 524Kbytes for the messages. The extremely high memory space could not be affordable in the internal memory. If the massive data are configured in the external memory, it would incur high bandwidth. Thus, to directly conquer the high memory cost problem, we need to develop another new optimization algorithm that could not only have memory requirement independent to disparity range, but also acquire approximate results to BP-M’s.

2. Large Image Buffers

Figure IV-29 Image buffer required by the SSAD and ADSW steps (a) shows that the required pixels for computing a target aggregated cost. For the target aggregated cost, the ADSW cost aggregation step aggregates the 5×7 matching costs in low resolution. These 5×7 matching costs is computed by the SSAD matching cost step using the 10×28 pixels in high resolution. Therefore, computing a target cost needs 1280 pixels in the target view image, and these pixels are cross 28 image rows.

119 (a)

(b)

Figure IV-29 Image buffer required by the SSAD and ADSW steps

(a) required pixels for computing a target aggregated cost, (b) image buffers for one matching cost row

For the above data dependency, all the 28 rows of three view images should be buffered into the internal memory, so that the external bandwidth be minimized. However, such the multiple-row image buffers are too large. For example of 1920×1080 sequences, the memory requirement for the image buffers would be 1920×28×3 pixels (i.e. 483Kbytes for YUV444 format). On the other hand, if the SSAD matching costs are stored for data reuse technique, the memory requirement is proportional to disparity range DR, and would be 960×7×128 (i.e. 860Kbytes) for the DR of 128. In addition, if the image pixels are accessed from external memory in run time, the image buffer could be reduced to the

“used pixels” region but the external bandwidth would be 1920×1080×7×3 pixels/frame (i.e.

130.6GBytes/frame for YUV444 format).

5

7

Target aggregated

cost

High-Resolution Pixels for SSAD

Low-Resolution Matching Costs for ADSW

28

10

Right-view image buffer

Center-view image buffer Used Pixels

Left-view image buffer Used Pixels

Used Pixels

10+127

10+127

28

10

28 28

120

To sum up, no matter what the data configuration method is applied, the required image data in the SSAD and ADSW steps would incurs the large image buffer or high external bandwidth. Thus, we should simplify the SSAD and ADSW steps in the HE-DE algorithm to reduce the image buffer.

3. High Computational Complexity in Filtering

In the HQ-DE algorithm, there are many filter-based processes, such as bilateral filter, joint bilateral upsampling, window vote, and the ADSW cost aggregation. These filter-based processes suffer from high computational complexity due to their larger window size. Table IV-3 lists all the filter-based processes in the HQ-DE algorithm. In which, the bilateral filter (BF) computation suffers from 11×11 for the high resolution in the NMR step. In addition, the WVote step requires the largest window size of 15×15. Because of their large window sizes, they occupy the high percentage of computation as analyzed in Figure IV-19. Thus, we decrease the window size of the filter-based processes in the HE-DE algorithm under the condition of preserving the disparity quality.

Table IV-3 Window sizes of filter-based processes in HQ-DE algorithm

Step Computation Frame

Resolution Window Size

No-Motion Registration (NMR) BF High 11×11

Adaptive Support-Weight Cost Aggregation (ADSW) BF Low 7×5

Occlusion Handling (OCC) Vote High 9×9

Joint Bilateral Upsampling (JBU) JBF High 7×7

Window Vote (WVote) Vote High 15×15

Still-Edge Preservation (SEP) Median High 3×3

4. Irregular Computation in Occlusion Handling

The final design challenge is the irregular computation in the occlusion handling step. This step first detects the occlusion region by left-right check (LRC) method, and then extends the occlusion region for background and foreground. Finally it fills the occlusion regions by the modified window vote method. The irregular computation is in the occlusion extension process, which needs to extend the occlusion region until the foreground is touched. This irregular computation is not compatible to all the other raster-scan computations, and is not suitable be implemented by high-throughput

121

pipelining architecture. Thus, we develop another new regular occlusion handling in the HE-DE algorithm.

In summary, for the high memory cost, the BP-M needs a frame-scale-magnitude memory space to store the cost cube and messages for whole frame, and cost cube calculation requires a large image buffer in run-time. On the other hand, for the high computational complexity, the filter-based computation is performed using too large window size, and the computation of proposed occlusion handing is not regular for extending occlusion region. Therefore, the proposed HE-DE algorithm focuses on these design challenges and conquers them.

4.3.2 Proposed Algorithm Flow

Figure IV-30 shows the main flow of the proposed HE-DE algorithm for center view. This algorithm flow also could be applied to the process of side views. In this algorithm, for the cost cube calculation, we propose the new window-based SSAD method to replace the block-based SSAD and ADSW steps in the HQ-DE algorithm. The new method could reduce the image buffers from 28 image rows to 5 image rows. For the temporal cost calculation, the same method in the HQ-DE algorithm is adopted.

Note that this algorithm removes the inter-view cost calculation step in the HQ-DE algorithm for high parallelism, because the step would result in the data dependency between the center view and side views. In other words, with the inter-view cost calculation step, the center-view disparity map should be computed first, and the side-view disparity maps are computed latter. Moreover, to support the computing order, the three-view input data would be loaded for three times for matching cost calculation. Therefore, we remove the inter-view cost calculation from our algorithm flow, and take care of the inter-view consistency in the occlusion handling step.

With the computed cost cube, we propose the cost diffusion method to compute the low-resolution disparity maps. The proposed cost diffusion method could replace the BP-M to reduce the memory requirement to be independent to the disparity range.

122

Figure IV-30 Flow of the HE-DE algorithm for center view

For the occlusion handling step, the new regular method is performed in the low resolution, and it also considers the inter-view consistency at the same time. Finally, the low-resolution disparity maps are scaled and refined by the JBU, window vote, and still-edge preservation steps. To reduce the computational complexity in filter-based processes, we decrease all the window size of filters to 5×5 under the condition of no observable quality degradation.

The mentioned design challenges in the HQ-DE algorithm are solved by the following method in the proposed HE-DE algorithm.