G ENERAL A LGORITHM F LOW

II. RELATED WORKS

2.2. G ENERAL A LGORITHM F LOW

Fig. 2. 4 shows a general framework for disparity estimation algorithms proposed by Scharstien and Szeliski [52]. Two images are first obtained and rectified to be the inputs and the expected result is the disparity map in this frame work. However, the disparity estimation can be roughly classified into two categories: local approach and global approach [52] and [53] in this framework. In the category of local approach, it only consists of the matching cost calculation and the cost aggregation. However, the optimization operation is additional executed for global approach. Finally, the disparity map is refined by the last disparity refinement step which is an optional process for computing fractional disparity and other post-processing. The literatures of each step inside the general framework are briefly reviewed as follows.

Fig. 2. 4 General framework of disparity estimation

Fig. 4.1. Matching Cost Calculation

To find the best correspondence pair, the matching cost is an essential quantitative evaluation. Fig. 2. 5 exhibits an example to illustrate the calculation of matching cost. In this figure, multiple reference pixels are marked as the correspondence candidates and all their matching costs have been computed accompanied a target pixel. However, the relation of nearest and farthest objects in

Matching Cost Calculation

Cost Aggregation

Disparity Selection/Optimization

Disparity Refinement

Target View Reference View

Target-View

Disparity Map

scene is recognized as disparity range DR and it will be used to represent the number of correspondence candidates. As a result, DR matching costs would be produced by the target pixel. In order to find out the overall disparity map, all matching costs of all target pixels have to be calculated and all calculated matching costs form a disparity image space. Fig. 2. 6 shows a disparity image space which contains the spatial dimensions X, Y and disparity dimension d. Overall, this disparity image space consumes H×W×DR, where H and W are the frame height and width, memory space to store the all matching costs of entire frame.

Target Pixel

Reference Pixels DR

(x, y) (x, y)

Matching Costs

Target-view Image

Reference-view Image

A Pair of Correspondences

……

Fig. 2. 5 Matching costs of a target pixel and its correspondence candidates

d

x

y

d = DR-1

d = 0 d = 1 d = 2

W

H DR

Fig. 2. 6 Matching costs of a target pixel and its correspondence candidates

There are many match measurements [3]-[52] as listed in Table 2-1 could be used to compute the cost disparity image space. These match measurements could be classified into pixel based and block based approach. For the pixel based approach, the absolute difference (AD) and the square difference (SD) are used for computing the matching costs by considering a target and reference pixel. To eliminate the sampling sensitivity [1], the half pixels could be further considered for pixel dissimilarity measurement. On the other hand, instead of using a target and reference pixel to compute the matching cost, a target and reference pixel block is used to compute the block based matching cost as Fig. 2. 7 shown. In addition, the statistical approach called normalized cross correlation reduces the sensitivity of radiometric gain and bias by using the block mean and variance. The Rank derives the rank value

of pixel color by transformation and the rank values are adopted for computing the matching costs. On the other hand, the Census transforms the pixel intensity into census bitstream consisting of the intensity comparison results between the center pixel and the support pixels. Afterward, the Hamming distance is calculated to derive the matching cost of two census bitstreams. In summary, since the Rank and Census try to transform the original pixel from color to another domain, their ability to resist the radiometric distortion between views would be much better.

Target Block

Reference Block (x-d, y)

(x, y) (u, v)

Support pixels

r

Fig. 2. 7 Block based matching cost calculation

Table 2-1 Different matching cost measurements Block-based

Normalized Cross Correlation (NCC)

∑|𝑥−𝑢|≤𝑡[𝐼_𝑡𝑡𝑡(𝑢, 𝑣) − 𝐼̅_𝑡𝑡𝑡][𝐼_𝑡𝑟𝑟(𝑢 − 𝑑, 𝑣) − 𝐼̅_{𝑡𝑡𝑟𝑟}]

|𝑦−𝑣|≤𝑡

�∑|𝑥−𝑢|≤𝑡[𝐼𝑡𝑡𝑡(𝑢, 𝑣) − 𝐼̅𝑡𝑡𝑡]²�𝐼𝑡𝑟𝑟(𝑢 − 𝑑, 𝑣) − 𝐼̅𝑡𝑡𝑟𝑟�²

|𝑦−𝑣|≤𝑡

Rank �𝐼𝑡𝑡𝑡′ (𝑥, 𝑦) − 𝐼_𝑡𝑟𝑟^′ (𝑥 − 𝑑, 𝑦)�, 𝑤ℎ𝑒𝑒𝑒 𝐼^′(𝑚, 𝑛) = ∑|𝑚−𝑢|≤𝑡,|𝑛−𝑣|≤𝑡𝐼(𝑚, 𝑛) > 𝐼(𝑢, 𝑣)

Census 𝐻𝐻𝑚𝑚𝐻𝑛𝐻(𝐼𝑡𝑡𝑡′ (𝑥, 𝑦), 𝐼_𝑡𝑟𝑟^′ (𝑥 − 𝑑, 𝑦)),

𝑤ℎ𝑒𝑒𝑒 𝐼^′(𝑚, 𝑛) = 𝑏𝐻𝑏𝑏𝑏𝑒𝑒𝐻𝑚|𝑚−𝑢|≤𝑡,|𝑛−𝑣|≤𝑡(𝐼(𝑚, 𝑛) > 𝐼(𝑢, 𝑣)) Pixel-based

Absolute Difference (AD) �𝐼𝑡𝑡𝑡(𝑥, 𝑦) − 𝐼𝑡𝑟𝑟(𝑥 − 𝑑, 𝑦)�

Square Difference (SD) �𝐼𝑡𝑡𝑡(𝑥, 𝑦) − 𝐼𝑡𝑟𝑟(𝑥 − 𝑑, 𝑦)�² Pixel Dissimilarity

Measure (PDM)

𝑚𝐻𝑛 {�𝐼𝑡𝑡𝑡(𝑥, 𝑦) − 𝐼𝑡𝑟𝑟(𝑥 − 𝑑, 𝑦)�, �𝐼𝑡𝑡𝑡(𝑥, 𝑦) − 𝐼_𝑡𝑟𝑟⁺ �, |𝐼𝑡𝑡𝑡(𝑥, 𝑦) − 𝐼_𝑡𝑟𝑟⁻ |}

𝑤ℎ𝑒𝑒𝑒 𝐼_𝑡𝑟𝑟⁺ 𝐻𝑛𝑑 𝐼_𝑡𝑟𝑟⁻ 𝐻𝑒𝑒 𝑏ℎ𝑒 𝑛𝑒𝐻𝐻ℎ𝑏𝑏𝑒𝐻𝑛𝐻 ℎ𝐻𝑎𝑓 𝑝𝐻𝑥𝑒𝑎 𝑏𝑓 𝐼_𝑒𝑒𝑓 (𝑥 − 𝑑, 𝑦)

Fig. 4.2. Cost Aggregation

The main purpose of cost aggregation is tried to gather the neighboring pixel costs in a window for center pixel for further processing usage. The assumption behinds of cost aggregation is that the neighboring pixels tend to have the same disparity and gathering the matching costs from neighbors could be able increase the reliability of matching cost. Therefore, the neighboring costs are accumulated in the cost aggregation step for the center pixel by the following equation,

𝐶_{𝑡𝑎𝑎𝑡}(𝑥, 𝑦, 𝑑) = ∑(𝑢,𝑣)∈𝑤𝑤𝑛(𝑥,𝑦)𝐶(𝑢, 𝑣, 𝑑) × 𝑊𝐻𝐻𝐻𝑒(𝑢, 𝑣)

∑(𝑢,𝑣)∈𝑤𝑤𝑛(𝑥,𝑦)𝑊𝐻𝐻𝐻𝑒(𝑢, 𝑣)

where C is the initial matching cost and C_aggr is the aggregated matching cost. In this equation, each initial cost C(v, u, d) in an aggregation window with window size r is accumulated with the weight W_aggr

(u, v) for the target cost C

_aggr

(x, y, d). In addition,

the accumulated value is normalized by the sum of weights. The computational complexity of this step is O(H×W×DR×r²

) proportional to the aggregation window

size.

Fig. 2. 8 shows different cost aggregation methods with different weighting distribution. The uniform weight as shown in Fig. 2. 8(a) contains constant weight and fixed r for every support pixel. However, this uniform weight suffers from the problem of over-blurred disparity map for small objects with too large r and disparity map incorrectness for textureless regions with too small r. Therefore, to receiving better disparity result, dynamically adjusting r according to image content as shown in Fig. 2. 8(b) is a good way to do that. The Gaussian weight approach Fig. 2. 8(c) which tries to make the pixels near window center has higher weighting is another commonly used way for deciding the weighting for cost aggregation. However, the disparity accuracy could not be achieved better due to the fixed window shape such square or circle.

To adaptively change the window shape, the 8-direction or 4-direction configuration as shown in Figure Fig. 2. 8(d) is used in the adaptive polygon weight approach [4] and [5] to fit the object shape. And then, the multiple cross lines concept as shown in Figure Fig. 2. 8(e) is adopted in the cross-based weight approach [6] to fit the object shape. In these two methods, the support region is grown from the window center until the dissimilar pixel has been encountered by the support region boundary.

Unfortunately, there two methods can be not performed well for the images with

highly texture content due to their continuous support regions.

However, the above mentioned problems could be able to be solved by the adaptive support-weight (ADSW) approach [7] since all support pixels are considered and their weights are decided by the bilateral filter kernels. The weights of ADSW are defined as

𝑊_{𝑡𝑎𝑎𝑡}(𝑢, 𝑣) = 𝑊_𝑡𝑡𝑡(𝑢, 𝑣) × 𝑊_𝑡𝑟𝑟(𝑢 − 𝑑, 𝑣)

where W_tar is the weight from target-view window and W_ref is the weight from reference-view window. The weights of Wtar and Wref can be computed by the bilateral filter kernels listed below,

𝑊(𝑢, 𝑣) = 𝑓(||(𝑥, 𝑦) − (𝑢, 𝑣)||)𝐻(||𝐼(𝑥, 𝑦) − 𝐼(𝑢, 𝑣)||)

where f is the spatial kernel with the position distance and g is the range kernel with the color distance. As a result, the aggregation weight could be large either the support pixel is near the center pixel or the support pixel is similar to center pixel with the help of two kernels.

Compared to the adaptive polygon weight and cross-based weight approaches, the aggregation weight of the adaptive support-weight shown in Fig. 2. 8(f) could fit object shape better for highly texture regions but at the expensive of significant high computational complexity requirement. However, the high computational complexity issues can be addressed by the integral histogram approach [8], the iterative aggregation with small window approach [9], and the data reuse approach in VLSI design [10]. In overall, by using the well-defined weights, the aggregation cost step can produce more reliable matching cost Caggr which will be very helpful for the upcoming disparity selection and optimization.

Uniform Weight Uniform Weight with

Adaptive Window Radius Gaussian Weight

Cross-based Weight Adaptive-Support Weight

1 1 1

Fig. 2. 8 Different cost aggregation methods (a) uniform weight, (b) uniform weight with adaptive window size, (c) Gaussian weight, (d) adaptive polygon weight, (e)

cross-based weight, and (f) adaptive support-weight Fig. 2. 9

Fig. 4.3. Disparity Selection/Optimization

After the initial costs have been aggregated, the disparity map could be computed by two optional methods. The most common and simple one is the winner-take-all manner (WTA) which decides the disparity result directly by determining the minimum cost reference pixel as the best correspondence for each target pixel.

Another disparity optimization approach takes the aggregated costs of entire frame for computing the disparity map through the energy minimization. Literature [48]

demonstrated that the latter one can derive more precise disparity maps via the evaluation results.

Some techniques such as dynamic programming (DP), graph-cut (GC), and belief propagation (BP) are the commonly adopted for disparity optimization. In one word, the main concept behinds these disparity optimization techniques are to transform the disparity optimization problem into the energy minimization problem.

The energy function could be generally formulated by 𝐸(𝑑) = 𝐸_{𝑑𝑡𝑡𝑡}(𝑑) + 𝜆𝐸_{𝑠𝑚𝑠𝑠𝑡ℎ}(𝑑)

where E_data refers to data term for penalizing the dissimilarity of a correspondence pair and Esmooth is smoothness term to penalizing the disparity inconsistency of two neighboring pixels. In addition, d stands for a selected disparity set for entire frame. In one word, a disparity set d is attempted to be found through the approach of minimizing the total energy E that the optimization technique provided.

The principle of some well-known optimization techniques are briefly described below.

(1) Dynamic Programming

The DP algorithm is a very well-known optimization algorithm which can be used in disparity estimation by mapping the disparity estimation into finding the shortest path problem. In DP, the optimization process is executed in a row by row manner for finding the optimal results.

Fig. 2. 9(a) shows the illustration to demonstrate how the shortest path problem can be solved by DP optimization technique. In this figure, the position of node is corresponding to the coordinate in the x-d plane and the shortest path will be from x of

0 to W-1. The path should be suffered from the matching penalty and smoothness

penalty on a node and an edge, respectively. During the DP optimization process, two steps called forward accumulating and backward tracing are executed in order to find out the path with minimum penalty. In the first step, the penalties are accumulated in forward direction to find out the moving path for each node as Fig. 2. 9(b) shown.

Afterward, the backward direction tracing as shown in Fig. 2. 9(c) is executed to find the minimum penalty path with the help of the moving direction map that the forward accumulating step produced.

However, the most critical issue caused by the DP technique is that the streak artifact in the disparity map due to the row by row processing mechanism. To eliminate the streak artifact problem, literature of Ohta and Kanade [11] performed the DP in a 3-D space which consists both of the original intra-scanline and the additional inter-scanline space. In addition, the tree structure has been used in the tree-based DP algorithms [12]-[14] to connect scanlines and thus remove the streak artifacts.

Fig. 2. 10 Illustration of dynamic programming optimization technique (a) graph model in DP approach, (b) forward accumulating, and (c) backward tracing

(2) Graph-Cut

Converting the disparity selection problem into the max-flow/min-cut problem [15] is the key concept of GC optimization technique. In addition, the associated optimization algorithms can be adopted as well for generating more accurate disparity

maps. Fig. 2. 10 shows an example to illustrate the min-cut/max-flow for disparity estimation where there are H×W×DR nodes with 6-connected node grid. The well-defined matching cost and smoothness cost on each edge can be regarded as pipes with different flow quantities due to different costs. In this illustration, the water would be flowed from the source to sink node through the pipes. The terms of min-cut and max-flow respectively stand for a cut surface cross edges that has the minimum flow and the allowed maximum flow from the source to the sink. In other words, the problems of min-cut and max-flow are equivalent in somehow. As a result, the disparity map can be obtained directly via the resultant cut surface.

Source

Sink

Cut surface

H DR

6-connected node

Fig. 2. 11 Illustration of graph-cut optimization technique

The widely used optimization techniques for solving the min-cut/max-flow

problem are the push-relabeling [16] and the augmenting path [17] and their computational complexities are highly depended on the number of label candidates (i.e. disparity range DR in disparity estimation). However, the large disparity range leads to these optimization techniques suffer from extremely high computational complexity problem.

Literatures of swap method [18] and an efficient augmenting path [19] have been proposed by Boykov to reduce the computational complexity of GC. The optimization process was performed isparity by disparity in swap method and each iteration only considers one new disparity. In addition, the literature of Chou et. al. [20] proposed a fast algorithm to predict the disparities by early skipping the partial optimization process based on the swap method. On the other hand, the computational speed of the push-relabeling approach depends on the processing order on nodes. As a result, Checkassky and Goldberg [21] proposed a highest-label order which can achieve more efficient computation than that of the typical FIFO order. In addition, the block-based graph cut algorithm was proposed by Delong and Boykov [22] to increase the parallelism of push-relabeling method.

In summary, due to the irregular computation and low parallelism of GC, the GC technique is not suitable for accelerating by GPU programming and VLSI design even through it can achieve more accurate disparity results.

(3) Belief Propagation

The first literature which applied to the BP approach to solve the disparity estimation problem was proposed by Sun et al. [24] to derive more accurate disparity maps via optimizing the energy in the graph model as shown in Fig. 2. 11. In this figure, a node represents a pixel and all nodes are connected by four-connection grid.

During the optimization process, the matching costs of each node are diffused through

the messages to neighboring nodes iteration by iteration and this diffusion mechanism is called message passing. Afterward, the disparity results are determined by aggregating the matching costs and messages of a node after several iterations.

matching cost

message

Fig. 2. 12 Illustration of belief propagation approach

The most critical issue in the BP approach is that the highest computational complexity, O(H×W×DR²

×T) due to the message passing. Here, the T refers to the

iteration counts. For the operation, the DR² results from the convolution and the iteration count T should be more than 10. Therefore, the literatures of Felzenswalb and Huttenlocher [25] proposed the hierarchical BP (HBP) and the linear-time message passing to reduce the computation of message passing. The HBP could increase the speed of disparity convergence and the linear-time message passing could reduce the complexity of convolution from O(DR²

) to O(DR). Szeliski et al. [26]

proposed the max-product loopy belief propagation (BP-M) to reduce the iteration counts by a scale. However, since the BP approach has the property of high parallelism, the BP technique is much suitable to be accelerated by the GPU programming and VLSI design [27]-[33]. Unfortunately, the high memory cost (4HW×DR) for storing the matching costs and messages of entire frame is the main hardware design issue. To solve this problem, the literatures of bipartite gird [25] and

the sliding approach [34] were proposed for lighting the memory access penalty and the predictive coding scheme [35] could be applied for message compression.

In summary, the DP algorithm could be easy to achieve real-time processing but suffer from the problem of streak artifacts. However, the other improvement methods would additional result in irregular computation. For the 2-D optimization techniques, although the GC technique can derive high accurate disparity map, but the irregular computational process significantly limits the capability of hardware accelerating.

Fig. 4.4. Disparity Refinement

In the final step, the post processing methods such as occlusion handling, object consistency enhancement, and temporal consistency enhancement are usually applied to further refine the disparity maps. Therefore, these methods are briefly described as follows.

(1) Occlusion Handling

The occlusion problem is defined as that the object point is visible in one view and invisible in the other view. Therefore, in the occlusion region, there are no correspondence pixels in the invisible view. In general, the incorrect disparities would appear in the occlusion regions and further induce artifacts in the view synthesis. To deal with the occlusion problem, the occlusion detection is adopted first to detect the occlusion and the occlusion filling mechanism is applied to fill the occlusion area by the background disparities in general. The basic methods for occlusion detection are surveyed in [45] based on different assumptions. The left-right check (LRC) assumes that a correspondence pair should have same disparity and the occlusion constraint (OCC) assumes that occlusion region in the other view would be resulted by the disparity gap of two pixels. In addition, the order of two pixels should have the correspondences with the same order in the other view as the order constraint (ORD)

assumed. In above occlusion detection techniques, the LRC is widely applied for the disparity refinement [6] and [40], and the OCC and the ORD are combined into the disparity optimization step [15] and [24] usually. With the detected occlusion pixels, the disparities in the occlusion region could be directly replaced by the reliable background disparities in occlusion filling step.

(2) Object Consistency Enhancement

In an object, the disparities are usually identical or changing smoothly. However, the textureless regions usually cause the incorrect disparities and thus affect the results of disparity maps. Therefore, the plane fitting approach [40] was usually adopted by the high-performance disparity estimation algorithms [44], [36], [37] to remove the disparity noise. In the plane fitting approach, the watershed segmentation, mean-shift clustering, or K-mean clustering is usually adopted for computing the segment information first. Based on the segment information, a new 3-D plane is constructed by the linear regression method using the disparities in a segment. Besides of the plane fitting method, the regional voting method [6] could also refine the disparity maps well. Compared to the plane fitting technique, the regional vote technique is much simpler due to the segment information is unnecessary.

(3) Temporal Consistency Enhancement

In previous work, most disparity estimation algorithms were focused on the still image sequence [48]. However, these algorithms didn’t consider the temporal consistency issue and thus result in some obstacles in the view synthesis application for video sequences. Therefore, if the temporal consistency issue has not been dealt with well, the disparity maps would produce flicker artifacts due to the independent generation of disparity for each frame. In addition, the disparities without temporal consistency treatment are unstable in the occlusion and textureless regions. As a result, the flicker artifacts would be further propagated to the view synthesis results and

becomes observed.

Intuitively, the neighboring frames are usually taken into account in the disparity estimation to address the temporal consistency issue. In the previous work [41]-[43], a disparity flow in spatial and temporal domain is constructed by buffering several disparity frames. Afterward, different smooth methods are executed in the disparity flow. On the other hand, since two adjacent frames are available, the temporal BP algorithm [38] executed the BP optimization in a 6-connection grid graph in which two additional connections are linked between previous and next frame. In addition, the temporal costs were added to matching costs according to previous disparity in the 3DVC’s DERS algorithm [45]-[47].

In summary, the view synthesis quality in 3DTV applications as well as inconsistent disparity problem can be significantly improved by the disparity refinement step.

III. Proposed Dual-Way Dynamic Programming Algorithm for

在文檔中 1920x1080@90fps 之深度運算設計與實作 (頁 12-31)