Introduction - 基於區域可變式視窗大小和適應性權重的視差估計演算法

Stereo matching is one of the most actively studied topics in computer vision.

The issue is to estimate the disparity map from a pair of rectified images of the same

scene taken from different viewpoints. The disparity of a pixel is the displacement

vector between corresponding pixels which horizontally shift from the left image to

the right image. The process of finding the disparity is referred as stereo matching or

disparity estimation. In recent years, a large number of algorithms have been

proposed to solve the problem. The stereo matching algorithm has been widely

adopted by applications such as 3D video conference and free viewpoint TV [1]. It

will continue to be an attractive topic with the development of 3D video market in the

next few years.

According to the work published by Scharstein and Szeliski [2], a stereo

matching algorithm generally consists of the following four steps: matching cost

computation, cost (support) aggregation, disparity computation and disparity

refinement. Figure 1-1 depicts the flow of a stereo matching algorithm.

Figure 1-1 A general flow of stereo matching algorithm.

Matching cost

The first step computes the initial matching costs of all disparity candidates for

each pixel by the cost measure, such as absolute difference (AD), gradient-based

measures, and non-parametric transforms like rank and census [3]. Among these

measures, the AD cost is the most commonly used for many stereo matching methods

due to its simplicity. The AD cost in RGB color space of a pixel p with respect to a

disparity d can be defined as:

𝐶(𝑝, 𝑑) = ∑_{{ , , }}|𝐼 (𝑝) − 𝐼 (𝑞)|

Where I_L and I_R represent the left and right images. The AD cost evaluating the

matching penalty seems intuitive; however, it has poor quality for global radiometric

changes. In a recent experiment by Hirschmϋller and Scharstein [4], the census

transform performs the best overall results in stereo matching methods, since the

match metrics compare the relative orderings instead of the intensity of the pixels.

Second, the cost aggregation step gathers the costs in a support window which is

usually a square window. A simple hypothesis of the cost aggregation is that

surrounding pixels with similar colors should be greatly correlated to the center pixel.

An aggregated cost of pixel p can be calculated by the following function:

𝐶_𝑎𝑔𝑔(𝑝, 𝑑) = ∑_𝑊_𝐿𝑐𝑜𝑠𝑡(𝑖, 𝑑)× 𝑤(𝑝, 𝑖)

∑_𝑊_𝐿𝑤(𝑝, 𝑖)

Where cost(i) means the initial matching cost obtained from the previous step. In

addition, w(p, i) represents the related weight between the pixel p and its neighbors i (1)

(2)

in current support windowW_L. A simple schematic diagram of cost aggregation is

depicted in Figure 1-2. The cost aggregation reduces the matching ambiguities and

noise in the initial cost volume due to lack of more information.

Figure 1-2 A cost aggregation in pixel p with its neighbors i

With the aggregated costs, the disparity map can be simply computed by the

winner-take-all (WTA) process, which is to select the disparity candidate with the

minimal aggregated cost. The WTA method can be expressed as D(p) = arg min

d C_agg(p, d)

, where d is the disparity candidate over a disparity search range. Another

optimization method such as graph-cut is introduced by Boykov et al. in [5].

Finally, the disparity refinement step further refines the disparity by correcting

the error caused by the outlier pixels or image noise. A simple refinement tool is a 3x3

median filter. Figure 1-3 represents how a median filter works. In addition, left-right

consistency check [6] is an effective refinement technique to deal with the occluded

region. We will introduce it later in 3.7.1.

(3)

Figure 1-3 A 3x3 median filter

With the above steps, the disparity estimation algorithms can be roughly

classified into two types: local algorithms and global algorithms. Local methods focus

on the matching cost computation step and the cost aggregation step. Instead, global

methods emphasize on the disparity computation step.

Local algorithms estimate the disparity of each pixel independently within a

support window. The matching costs are aggregated over the window, after that, the

disparity candidate with the minimal cost is simply selected for the pixel by the WTA

rule. The local algorithms have low computation complexity and storage requirement,

so they are generally adopted by real-time applications. Recent research has proved

that a well-selected support window can give a quality result. The local approach with

the adaptive support weight (ADSW) is proposed by Yoon et al. [7] , which can

achieve the goal to apply a support window of arbitrary shape. Therefore, the ADSW

can have the comparable result approaching to the global method with considerable

execution time. Later, Tombari et al. proposed a segment-based support method [8] to

assign different weights to the pixels in a support window by based on the

segmentation information. Recently, the mini-census adaptive support weight

approach (MCADSW) [9] is accomplished to have lower complexity and more

capability of handling brightness bias problems than the original ADSW.

On the other hand, the cost aggregation step is simple in global algorithms.

Instead, the emphasis is on the disparity computation step. Global approaches

formulate the stereo matching problem as the objective energy function and

minimize it to determine the disparity map. The energy functions often include data

term and a neighboring term. Some efficient optimizers like graph-cut [5] and belief

propagation are employed to minimize the energy function. A cooperative

optimization based on region segment proposed by Wang and Zheng [10] is the

state-of-art of global approach according to the Middlebury evaluation [11]. This

algorithm uses regions as matching objects and defines the corresponding energy

function with the constraints on data term, smoothness, and occlusion. Consequently,

global methods produce more accurate results than common local methods, but they

suffer from high computational complexity and can hardly be used in real-time

implementation.

The proposed algorithm in this thesis is focused on local algorithms. We

modified the MCADSW [9] algorithm by adopting a robust matching cost measure

and utilizing the segment information to determine variable support size for cost

aggregation. Moreover, the region information can also be used in our region-based

cross voting scheme to improve the quality of the disparity refinement result.

The rest of this thesis is organized as follows. Section 2 introduced the details of

the related works about local stereo matching methods. Section 3 describes our

motivation and the proposed algorithm step by step. The experimental results are

presented in section 4. Finally, section 5 concludes this thesis.

在文檔中基於區域可變式視窗大小和適應性權重的視差估計演算法 (頁 11-17)