Stereo matching is one of the most actively studied topics in computer vision.
The issue is to estimate the disparity map from a pair of rectified images of the same
scene taken from different viewpoints. The disparity of a pixel is the displacement
vector between corresponding pixels which horizontally shift from the left image to
the right image. The process of finding the disparity is referred as stereo matching or
disparity estimation. In recent years, a large number of algorithms have been
proposed to solve the problem. The stereo matching algorithm has been widely
adopted by applications such as 3D video conference and free viewpoint TV [1]. It
will continue to be an attractive topic with the development of 3D video market in the
next few years.
According to the work published by Scharstein and Szeliski [2], a stereo
matching algorithm generally consists of the following four steps: matching cost
computation, cost (support) aggregation, disparity computation and disparity
refinement. Figure 1-1 depicts the flow of a stereo matching algorithm.
Figure 1-1 A general flow of stereo matching algorithm.
Matching cost
2
The first step computes the initial matching costs of all disparity candidates for
each pixel by the cost measure, such as absolute difference (AD), gradient-based
measures, and non-parametric transforms like rank and census [3]. Among these
measures, the AD cost is the most commonly used for many stereo matching methods
due to its simplicity. The AD cost in RGB color space of a pixel p with respect to a
disparity d can be defined as:
𝐶 (𝑝, 𝑑) = ∑ { , , }|𝐼 (𝑝) − 𝐼 (𝑞)|
Where IL and IR represent the left and right images. The AD cost evaluating the
matching penalty seems intuitive; however, it has poor quality for global radiometric
changes. In a recent experiment by Hirschmϋller and Scharstein [4], the census
transform performs the best overall results in stereo matching methods, since the
match metrics compare the relative orderings instead of the intensity of the pixels.
Second, the cost aggregation step gathers the costs in a support window which is
usually a square window. A simple hypothesis of the cost aggregation is that
surrounding pixels with similar colors should be greatly correlated to the center pixel.
An aggregated cost of pixel p can be calculated by the following function:
𝐶𝑎𝑔𝑔(𝑝, 𝑑) = ∑ 𝑊𝐿𝑐𝑜𝑠𝑡(𝑖, 𝑑)× 𝑤(𝑝, 𝑖)
∑ 𝑊𝐿𝑤(𝑝, 𝑖)
Where cost(i) means the initial matching cost obtained from the previous step. In
addition, w(p, i) represents the related weight between the pixel p and its neighbors i (1)
(2)
3
in current support windowWL. A simple schematic diagram of cost aggregation is
depicted in Figure 1-2. The cost aggregation reduces the matching ambiguities and
noise in the initial cost volume due to lack of more information.
Figure 1-2 A cost aggregation in pixel p with its neighbors i
With the aggregated costs, the disparity map can be simply computed by the
winner-take-all (WTA) process, which is to select the disparity candidate with the
minimal aggregated cost. The WTA method can be expressed as D(p) = arg min
d Cagg(p, d)
, where d is the disparity candidate over a disparity search range. Another
optimization method such as graph-cut is introduced by Boykov et al. in [5].
Finally, the disparity refinement step further refines the disparity by correcting
the error caused by the outlier pixels or image noise. A simple refinement tool is a 3x3
median filter. Figure 1-3 represents how a median filter works. In addition, left-right
consistency check [6] is an effective refinement technique to deal with the occluded
region. We will introduce it later in 3.7.1.
(3)
4
Figure 1-3 A 3x3 median filter
With the above steps, the disparity estimation algorithms can be roughly
classified into two types: local algorithms and global algorithms. Local methods focus
on the matching cost computation step and the cost aggregation step. Instead, global
methods emphasize on the disparity computation step.
Local algorithms estimate the disparity of each pixel independently within a
support window. The matching costs are aggregated over the window, after that, the
disparity candidate with the minimal cost is simply selected for the pixel by the WTA
rule. The local algorithms have low computation complexity and storage requirement,
so they are generally adopted by real-time applications. Recent research has proved
that a well-selected support window can give a quality result. The local approach with
the adaptive support weight (ADSW) is proposed by Yoon et al. [7] , which can
achieve the goal to apply a support window of arbitrary shape. Therefore, the ADSW
can have the comparable result approaching to the global method with considerable
execution time. Later, Tombari et al. proposed a segment-based support method [8] to
assign different weights to the pixels in a support window by based on the
5
segmentation information. Recently, the mini-census adaptive support weight
approach (MCADSW) [9] is accomplished to have lower complexity and more
capability of handling brightness bias problems than the original ADSW.
On the other hand, the cost aggregation step is simple in global algorithms.
Instead, the emphasis is on the disparity computation step. Global approaches
formulate the stereo matching problem as the objective energy function and
minimize it to determine the disparity map. The energy functions often include data
term and a neighboring term. Some efficient optimizers like graph-cut [5] and belief
propagation are employed to minimize the energy function. A cooperative
optimization based on region segment proposed by Wang and Zheng [10] is the
state-of-art of global approach according to the Middlebury evaluation [11]. This
algorithm uses regions as matching objects and defines the corresponding energy
function with the constraints on data term, smoothness, and occlusion. Consequently,
global methods produce more accurate results than common local methods, but they
suffer from high computational complexity and can hardly be used in real-time
implementation.
The proposed algorithm in this thesis is focused on local algorithms. We
modified the MCADSW [9] algorithm by adopting a robust matching cost measure
and utilizing the segment information to determine variable support size for cost
6
aggregation. Moreover, the region information can also be used in our region-based
cross voting scheme to improve the quality of the disparity refinement result.
The rest of this thesis is organized as follows. Section 2 introduced the details of
the related works about local stereo matching methods. Section 3 describes our
motivation and the proposed algorithm step by step. The experimental results are
presented in section 4. Finally, section 5 concludes this thesis.
7