使用於立體視差估算之快速圖形切割演算法

(1)

國立交通大學

電子工程學系電子研究所

碩士論文

使用於立體視差估算之快速圖形切割

演算法

Fast Graph Cuts Algorithm for Disparity

Estimation

研究生：周正偉

指導教授：杭學鳴

(2)

使用於立體視差估算之快速圖形切割

演算法

Fast Graph Cuts Algorithm for Disparity Estimation

研究生：周正偉 Student: Cheng-Wei Chou

指導教授：杭學鳴博士 Advisor: Dr. Hsueh-Ming Hang

國立交通大學

電子工程學系電子研究所

碩士論文

A Thesis

Submitted to Department of Electronics Engineering & Institute of Electronics College of Electrical and Computer Engineering

National Chiao Tung University in partial Fulfillment of the Requirements

for the Degree of Master in

Electronic Engineering June 2010

Hsinchu, Taiwan, Republic of China

中華民國九十九年六月

(3)

使用於立體視差估算之快速圖形切割

演算法

研究生: 周正偉

指導教授: 杭學鳴博士

國立交通大學

電子工程學系電子研究所碩士班

摘要

視差估算在 3D 視頻處理系統中是其中一個關鍵的因素。許多技術已經被提出來計算視差圖，圖形切割演算法是一種公認較好的視差估算計畫。然而，圖形切割演算法具有很高的計算複雜度。在這篇論文中，我們提出了一個用於視差估算的快速圖形切割演算法，有兩個加速的技巧被提出：一個是提前終止規則，另一個是排出 α-β 交換對的搜索的優先順序。我們的模擬結果表現，當我們跟原始方法比較，該演算法可以加速 68%的平均運算時間。同時，視差圖的品質可以保持在幾乎跟原始方法一樣。另一個加速技術，我們是採用多解析度的方法。一開始我們先對原始影像降頻取樣，並針對低解析度的影像作視差估算，產生低解析度的視差圖。接著，我

(4)

們再對低解析度的視差圖做升頻取樣，並以此視差圖作為初始值去做原始解析度的視差估測，我們去測試幾種降頻取樣及升頻取樣的方式，並找到最佳的組合。我們的模擬顯示，多解析度的圖形切割演算法只使用原始計算時間的的 16%，而壞像素的升幅只有 1%。我們研究的最後一個主題是使用多相機拍照的視差估測，初步觀察顯示了一些有趣的結果，我們需要進一步的實驗才能發揮這主題的優勢。

(5)

Fast Graph Cuts Algorithm for Disparity

Estimation

Student: Cheng-Wei Chou

Advisor: Dr. Hsueh-Ming Hang

Department of Electronic Engineering &

Institute of Electronics

National Chiao Tung University

Abstract

Disparity estimation is one of the critical elements in a 3D video processing system. Many techniques have been proposed to calculate the disparity map from a pair of images and the graph cut (GC) algorithm is one of the recognized better disparity estimation schemes. However, GC has a very high computational complexity.

In this thesis, we propose a fast GC algorithm for disparity estimation purpose. Two accelerating techniques are suggested: one is the early termination rule and the other is prioritizing the α-β swap pair search order. Our simulations show that the proposed fast GC algorithm can reduce 68% computing time on the average, when compared with the original GC scheme. Meanwhile, its disparity estimation performance is about the same as that of the original GC.

(6)

original images are down-sampled and a low-resolution disparity map is first estimated. Then, the low-resolution disparity map is up-sampled as the initial values for estimating the disparity map of the original images. Several down-sampling and up-sampling filters are tested to find the best combination. Our simulation shows that the multi-resolution GC (MRGC) algorithm uses only 16% of the original computing time and the bad pixel probability increases only by 1%. The last topic we investigate is disparity estimation using multi-camera pictures. The initial exploration shows some interesting results. Further investigation is needed to fully take the advantage of multiple images recoded by a camera array.

(7)

誌謝

能夠完成碩士論文，我首先要感謝的是指導教授杭學鳴老師，在研究過程中，經由老師耐心指導，學習到做研究的方法和應有的態度。除了在專業領域的教導，老師也時常關心我們的身體健康。這兩年來，從老師身上不僅僅只學到專業知識，更學到待人處事的方法。相信這些經驗將成為我人生中重要的瑰寶。研究過程中，非常感謝蔡彰哲學長的指導。從確定研究目標到完成實驗，學長總是費盡心思的與我討論並給予意見。在我遇到問題的時候，帶著我找尋答案，真的非常感謝蔡彰哲學長。非常感謝大師，無論在專業領域上，或是論文寫作技巧，都給予我許多資訊與幫助，在大師循循善誘的教導下，讓我在碩士生涯成長不少。此外，也非常感謝家揚、雄哥的幫助，以及眾多學長學弟同學們的陪伴，讓我的碩士班生活更多采多姿。最後要感謝我的爸媽以及姊姊，在我求學生涯中默默的支持我，你們的溫暖是我前進的動力。

(8)

List of Figures

Fig. 2-1 Stereo image geometry ... 6

Fig. 2-2 Process of the general stereo correspondence algorithms ... 7

Fig. 2-3 Disparity space image ... 7

Fig. 2-4 An illustration of the shiftable window ... 10

Fig. 2-5 Stereo matching using dynamic programming ... 12

Fig. 2-6 Program structure of the middlebury platform ... 14

Fig. 3-1 A simple example of the graph and the minimum cut (the red line) ... 19

Fig. 3-2 A simple example of push-relabel algorithm ... 22

Fig. 3-3 An example of a directed weighted graph ... 26

Fig. 3-4 An example of the graph for a 1D image ... 26

Fig. 3-5 Properties of a cut on the graph ... 28

Fig. 3-6 The change of disparity map after an - swap ... 28

Fig. 3-7 (a) An example of the graph with multiple terminals (b) An induced graph by a multiway cut (dotted lines indicate cut edges) ... 30

Fig. 3-8 Flowchart of GC... 31

Fig. 4-1 Disparity distribution of the test image pair ‘sawtooth’ after the -th iteration of the outer loop ( ) ... 33

Fig. 4-2 E and in the energy minimization process of the test image pair ‘sawtooh’ ... 34

Fig. 4-3 RMS_error and Bad_pixel in the energy minimization process of ‘sawtooh’34 Fig. 4-4 Disparity distribution of ‘sawtooth’ after the 1st outer-loop iteration ... 36

Fig. 4-5 Flow chart of our proposed fast GC ... 37

Fig. 4-6 Plot of computing time ... 39

Fig. 4-7 Plot of rms_error_all ... 40

Fig. 4-8 Plot of bad_pixels_all ... 41

Fig. 4-9 Disparity maps of ‘Map’ ... 44

Fig. 4-10 Disparity maps of ‘Sawtooth’ ... 44

Fig. 4-11 Disparity maps of ‘Tsukuba’ ... 45

Fig. 4-12 Disparity maps of ‘Venus’ ... 45

Fig. 5-1 Flowchart of MRGC ... 49

Fig. 5-2 An example of 4 to 1 pixel-skip down-sampling ... 50

Fig. 5-3 An example of disparity map up-sampling and scaling ... 51

Fig. 5-4 The up-sampling method of H.264 ... 52

Fig. 5-5 The disparity pair candidates in neighborhood graph cuts (a) (b) ... 53

(11)

Fig. 5-7 Flowchart of GC for multi-camera pictures ... 55

Fig. 5-8 The scaling and shifting moves ... 55

Fig. 5-9 Plot of computing time comparison ... 58

Fig. 5-10 Plot of rms_error_all ... 59

Fig. 5-11 Plot of bad_pixels_all... 60

Fig. 5-12 Disparity maps of ‘Map’ ... 61

Fig. 5-13 Disparity maps of ‘Sawtooth’ ... 62

Fig. 5-14 Disparity maps of ‘Tsukuba’ ... 63

Fig. 5-15 Disparity maps of ‘Venus’ ... 64

Fig. 5-16 Disparity maps of “Sawtooth” ... 66

(12)

List of Tables

Table 2-1 Match metrics for correspondence matching [6]... 9

Table 2-2 Descriptions of error metrics [5] ... 16

Table 2-3 Test data [5] ... 17

Table 3-1 Weight information of the edges ... 27

Table 4-1 The experiment environment setting ... 38

Table 4-2 Comparison of computing time ... 39

Table 4-3 Comparison of rms_error_all ... 40

Table 4-4 Comparison of bad_pixels_all ... 41

Table 4-5 Comparison of bad_pixels_nonocc ... 42

Table 4-6 Comparison of bad_pixels_textureless ... 42

Table 4-7 Comparison of bad_pixels_discont ... 43

Table 5-1 Computing time comparison ... 58

Table 5-3 Comparison of bad_pixels_all ... 60

(13)

Chapter 1 Introduction

1.1 Background

Recently the ISO/IEC Moving Picture Expert Group (MPEG) initiated a standardization process on free viewpoint television (FTV) [1]. As a new type of interactive video system, FTV can synthesize 3 dimensional (3D) scenes at nearly any (virtual) viewpoint and thus receives its name. An FTV system typically consists of modules of multi-view video capture, image correction, depth map estimation, data coding/decoding, and view synthesis. Specifically, the disparity estimation module is an inevitable component of an FTV system, which is used in both multi-view scene analysis and synthesis.

With the help of epipolar geometry [2], the general stereo correspondence problem is simplified to disparity estimation (DE) [3][4][5] under the assumption of dense camera array. Here, disparity refers to the location difference of the corresponding objects along the epipolar lines on the two recorded images. For years, most researchers have focused on improving the accuracy of DE, not on the speed acceleration. With the emerging 3D TV and FTV, we focus on fast DE algorithms in this study.

(14)

calculation, 2) cost aggregation, 3) disparity computation and optimization, and 4) disparity refinement. In this contribution, we use the absolute intensity difference (with the Birchfield and Tomasi’s sampling insensitive dissimilarity measure) [5] between the corresponding feature points as the initial matching cost. The cost aggregation collects the initial matching cost by using a moving average filter in a square window (box filter). Once the aggregated costs are computed, the disparity computation and optimization module determines which discrete set of disparities best represents the scene surface depth. Finally the disparity refinement step increases the disparity accuracy to sub-pixel precision.

The stereo algorithm can be categorized as local and global approaches [5]. The local approach focuses on the cost calculation and aggregation. The winner-take-all method (WTA), which chooses the lowest aggregated cost as the selected disparity at each pixel, is a simple disparity computation and optimization method used in the local approach. In the global approach, we further consider the disparity smoothness among neighboring pixels More sophisticated algorithms, such as dynamic programming, simulated annealing, belief propagation (BP) and graph cut (GC), has been suggested to offer better DE results.

1.2 Motivation and Contributions

(15)

qualities of them are similar. According to Tappen and Freeman [7], the computational time of the synchronous BP is much larger than GC, but the accelerated BP uses only 80% of the GC’s computational time. In this study, we choose GC because it can be accelerated and may have advantages in computation. In addition, to our knowledge, the fast graph cuts algorithms are rare in literature. Therefore, we propose methods to accelerate GC algorithm.

GC and its variations [8]-[11] generally show admirable performance in their disparity estimation quality. However, GC suffers from the huge amount of processing time. Owing to its good DE performance, GC is chosen as the target algorithm for speed-up. The major contributions in this thesis include the following items.

(1) An early termination process is proposed to save the computing time. (2) We save computing time by prioritizing the - swap pair sequence.

(3) The computing time greatly increases with the image size and the disparity range. We use multi-resolution graph cuts to reduce the computational complexity. (4) We attempt to improve the disparity map for the multi-camera array.

1.3 Organization of the Thesis

In chapter 2, we briefly introduce the background of computational stereo. In chapter 3, we describe the graph cuts process that minimizes the energy function in DE problem. Chapter 4 describes our proposal of the early termination and optimizing

(16)

the - swap pair sequence. Simulations are conducted and the significant amount of computing time saving is shown. Chapter 5 discusses the multi-resolution graph cuts algorithm and our initial investigation on the disparity estimation on the multi-camera array picture. Finally, brief summary and remarks on future work are given in chapter 6.

(17)

Chapter 2 Introduction of Computational

Stereo

2.1 Overview

The concept of stereo correspondence is to find the correspondent point in the image of the other view. Based on epi-polar geometry, the general stereo correspondence problem is simplified to disparity estimation under the assumption of dense camera array. The search region of the corresponding features between the left and the right images can thus be reduced to the epi-polar lines. The goal of a stereo correspondence algorithm is to produce a univalued function in disparity space

that best describes the depth information of the surfaces in the scene.

2.2 Epipolar Geometry

Fig. 2-1 shows a typical stereo image geometry in the 3D space. There are two

pinhole cameras viewing a 3D scene from difference view points. We place a virtual image plane in front of each camera. The intersections between the baseline connects two cameras and the two image planes are called epipole. The plane formed by any point in the space and the base line is epipolar plane. The epipolar plane intersects each camera's image plane, the intersection forms lines—the epipolar lines. Any point

(18)

point P onto the epiploar line. Therefore, if the two epipolar lines belong to the same epipolar plane, for each point observed on one epipolar line must be observed on the other epipolar line. We can use this property to reduce the search range from the whole image to an epipolar line. This geometry property is called epipolar constraint.

Fig. 2-1 Stereo image geometry

2.3 The General Structure of Matching Algorithm

According to Scharstein and Szeliski [5], the stereo correspondence algorithms generally consist of four parts:

1. Initial matching cost computation, 2. Cost aggregation,

3. Disparity computation and optimization, 4. Disparity refinement,

(19)

Fig. 2-2 shows the general procedure of a stereo correspondence algorithm. The output is the disparity map of an image pair input. The details of the four parts will be discussed in the following sections.

Fig. 2-2 Process of the general stereo correspondence algorithms 2.3.1 Initial Matching Cost Computation

The matching cost represents the dissimilarity between two pixels in our correspondence problem. The range of the disparity candidates is called the disparity range. The initial disparity space image (Fig. 2-3) consists of the matching cost values over all pixels and all disparities.

Fig. 2-3 Disparity space image

The most popular pixel-based match metrics are the squared intensity difference (SD) and the absolute intensity difference (AD). The correspondence problem to find

(20)

the best match between the candidate pixel and the reference pixel in the support region . In addition, Birchfield and Tomasi proposed a matching cost sensitive to image sampling [12][13].The several match metrics are listed in Table 2-1 [6]. In our platform, the parameter match_fn selects the matching cost function we use. The general formula of matching cost computation can be written as

(2.1) where and represent the left (reference) and the right images, respectively.

2.3.2 Cost Aggregation

After computing matching cost, we aggregate nearby pixel costs. Because the disparity values of the neighboring pixels should often be consistent, we select a support window to add up their costs. The cost aggregation can be formulated as

(2.2)

where is the initial matching cost calculated in the previous step. The function indicates the related weight of neighboring pixels contributing to the aggregated cost. Although the cost aggregation can reduce the noise effect, it blurs the edge of the object when aggregating the cost of difference objects. Therefore, how to design a good aggregation scheme is an important topic. In our platform, the parameter aggr_fn selects the aggregation method we use. Several aggregation methods are described below:

(21)

Box filter: Use a separable moving average filter (add one right/bottom value, subtract one left/top). The decision of the window size will affect the performance and the computation time. If we want to implement real-time matcher, we should consider the window size as a vital factor.

Table 2-1 Match metrics for correspondence matching [6]

Match Metric Definition

Normalized Cross-Correlation Sum of Squared Difference Normalized Sum of Squared Difference Sum of Absolute Difference Mutual Information

and represent the intensity value of left and right image where

is the index of pixel. represents the mean of intensity value in the support window . is the probability density function and represents the disparity value.

(22)

Binomial filter: The function is a separable FIR (finite-duration impulse response) filter. We use the coefficients 1/16{1, 4, 6, 4, 1} proposed by Burt and Adelson’s [14] Laplacian pyramid.

Minimum filter: The function is a sliding window with a location bias (Fig. 2-4). We can use a box filter and its center is not the candidate pixel. But the candidate pixel should be included in the shifted windows, it is called shiftable window [15]. We choose the minimum aggregation cost among all the shiftable windows in the pre-selected range. The shiftable window can avoid aggregating the cost near object boundary.

Fig. 2-4 An illustration of the shiftable window 2.3.3 Disparity Computation and Optimization

The disparity map can be obtained from the original matching cost or the aggregated cost. In addition, the disparity computation method can be categorized into

(23)

two types: the local and the global approaches. These two approaches are described as follows.

In a local method, the matching cost or the cost aggregation are the key components. The simplest local method is the Winner Take All (WTA) algorithm, in which the disparity of each pixel is determined by minimizing the matching cost or aggregated cost in the disparity search range , that is,

(2.3)

Moreover, the disparity of each pixel is independently calculated.

In a global method, we define an energy function, which includes a data term and a smooth term. The data term includes the cost function that we discuss previously, and the smooth term represents the smoothness penalty of the disparity map. The detail of the energy function will be discussed in section 3.1. One of the earlier proposed global optimization methods is the dynamic programming (DP) [16]. The dynamic programming scheme optimizes the energy function of each scanline independently. Fig. 2-5 shows a stereo matching using dynamic programming for a pair of corresponding scanline. We select the minimum path through the matrix of all pairwise matching costs. The lowercase array represents the intensities along a scanline of the left image. The uppercase array represents the intensities along a corresponding scanline of the right image. The matches are indicated by M, and the

(24)

partially occluded points are indicated by L or R, which based on the points only visible in the left image or the right image. In this example, the disparity range is 0-4, which indicated by the non-shaded boxes. The shaded boxes are disparities outside this range. Although dynamic programming can optimize the horizontal global information, the vertical correlation is not considered. The disparity maps produced by dynamic programming may exhibit horizontal streaks, and it reduces the subjective quality of the synthesized image.

R ig ht s ca nl in e (u pp er ca se a rr ay )

Fig. 2-5 Stereo matching using dynamic programming

For considering the vertical and horizontal information simultaneously, the energy minimization using the so-called graph-cuts technique has been proposed. This optimization algorithm performs well in disparity estimation. Unfortunately, the

(25)

computation and the storage requirement for graph-cuts algorithm are enormous. The detail of graph-cuts will be discussed in sections 3.2 and 3.3, and we will propose our algorithm for reducing the computation time in chapter 4.

2.3.4 Disparity Refinement

Most stereo correspondence algorithms produce an integer disparity map. The

integer disparity is not good enough for some applications, which require good-quality synthesized images. To improve the synthesis result, many algorithms apply a disparity refinement stage in the procedure of stereo correspondence algorithms. In the disparity refinement stage, the sub-pixel disparity map can be computed. We can increase the resolution of the disparity map with a little additional computation. However, the goal of this research is to decrease the computation and maintain the quality of the disparity map. Because we do not focus on the synthesis result, we do not use the disparity refinement stage in the reference software (platform) by setting the Boolean variable refine_subpix false.

2.4 A Taxonomy Evaluation

2.4.1 Overview of the Platform

The source codes downloaded from the middlebury website are used as our

platform [5]. Fig. 2-6 depicts the program structure of the middlebury platform. In the beginning, the main file calls the interpreCommandLine function which is included in

(26)

the StereoIO file. In the interpreCommandLine function, the first step initializes the entire parameters (StereoParameter.cpp), and then we adjust the values of parameters specified by the configuration file or the command lines (ParameterIO.cpp).The next step executes the stereo matching program (StereoMatcher.cpp), which is the primary part in the platform. The stereo matching can be divided into four components as depicted Fig. 2-2. They are initial matching cost computation, cost aggregation, disparity computation (optimization), and disparity refinement. At the end, the program evaluates the quality by comparing the computed disparity map and the ground truth disparity map (StcEvaluate.cpp).

main.cpp

StereoIO.cpp

StereoParameters.cpp ParameterIO.cpp

StereoMatcher.cpp StcEvaluate.cpp

StcPreProcess.cpp StcRawCosts.cpp StcAggregate.cpp StcRefine.cpp

BoxFilter.cpp StcDiffusion.cpp MinFilter.cpp _{StcOptimize.cpp}

StcOptDP.cpp StcGraphcut.cpp StcSimulAnn.cpp StcOptSO.cpp

(27)

2.4.2 Quality Metrics

To evaluate the quality of a disparity map computed by a stereo algorithm, we compute the following two quality metrics in our platform:

1. RMS (root-mean square) error between the computed disparity map and the ground truth map can be written as

(2.4)

where is the RMS error, is the total number of pixels and denotes the image area.

2. Percentage of bad matching pixels can be written as

(2.5)

where is the percentage of bad pixels and is a disparity error tolerance. We simply adopt the default setting of the published platform and set in our experiment.

Besides computing the two quality metrics over the whale image, we also compute the two quality metrics over three difference kinds of regions, which are textureless regions , occluded regions and depth discontinuity regions . Their symbols and descriptions are listed in Table 2-2.

(28)

Table 2-2 Descriptions of error metrics [5]

PARAMETER NAME SYMBOL DESCRIPTION

rms_error_all RMS disparity error

rms_error_nonocc RMS disparity error (no occlusions)

rms_error_occ RMS disparity error (at occlusions)

rms_error_textureless RMS disparity error (textureless)

rms_error_textured RMS disparity error (textured)

rms_error_discont RMS disparity error (discontinuities)

bad_pixels_all Bad pixel percentage

bad_pixels_ nonocc Bad pixel percentage (no occlusions)

bad_pixels_ occ Bad pixel percentage (at occlusions)

bad_pixels_ textureless Bad pixel percentage (textureless)

bad_pixels_ textured Bad pixel percentage (textured)

bad_pixels_ discount Bad pixel percentage (discontinuities)

2.4.3 Test Data

We download the test data on the middlebury website. The test data set we used is shown in Table 2-3. The four sequences, which are map, tsukuba, sawtooth, and venus, are the most commonly used ones for quality evaluation.

(29)

Table 2-3 Test data [5]

Map Sawtooth Tsukuba Venus

Image size Input Ground Truth Occlusion and discontinuities Occlusion and textureless

(30)

Chapter 3 Energy Minimization by Graph Cuts

3.1 Overview

In disparity estimation, the graph cuts and the belief propagation algorithms are generally recognized as the better global optimization methods. Unfortunately, their computation time is very high. Because we only focus on the graph cuts method, we only describe the procedure of the graph cuts in this chapter. The maximum flow/minimum cut (max-flow/min-cut) problem and an algorithm which solves the max-flow/min-cut problem are introduced [17][18]. Based on the graph cuts, two algorithms have been proposed for solving the stereo correspondence problem by minimizing the energy function, namely and [8].

3.2 Max-Flow and Min-Cut Problem

First, we glance at the graph theory. In Fig. 3-1, we show a simple example of a graph . A directed graph is defined as a set of nodes (vertices) and a set of directed edges that connect the nodes. In the graph, a source terminal as and a sink terminal as are denoted. A cut is a set of edges such that the two terminals become separated on the induced graph . A cut can also be represented by which produces a partition of into and

(31)

A minimum cut is a cut whose cost is the minimum over all possible cuts of . The minimum cut problem can be solved by finding a maximum flow from the source to the sink . In other words, the maximum source-to-sink flow is equal to the cost of the minimum cut in . Maximum flow can be considered the maximum “amount of water” that can be sent from the source to the sink, and the cost of edge can be considered the capacity of a directed “pipe”.

The algorithms to solve the maximum flow problem can be classed into two groups: Ford-Fulkerson algorithm [19] and push-relabel algorithm [20]. The Ford-Fulkerson algorithm examines the whole residual network to find an augmenting path. The algorithm begins with no flow and runs iteratively. At each iteration, the flow is increased by finding the augmenting path from the source to the sink in the residual network. The process repeats until no further augmenting path we can find, and the flow is the maximum flow.

(32)

If we used a breath-first search to implement the augmenting path calculation in the Ford-Fulkerson algorithm, the bound of running time can be improved to , where is the number of nodes and is the number of edges in the graph. We call the Ford-Fulkerson algorithm so implemented the Edmonds-Karp algorithm. Push-relabel algorithms look only at the node's neighbors in the residual network and process one node at a time. Compared with the Ford-Fulkerson algorithm, the push-relabel algorithms are local methods and a simple implementation that runs in time. Besides, unlike the Ford-Fulkerson, the push-relabel algorithms do not maintain the flow-conservation property throughout their execution. Therefore, we use push-relabel algorithms in this research instead of the Ford-Fulkerson method. We will describe the detail of push-relabel algorithms in the next section.

3.3 Push-Relabel Algorithm

In a push-relabel algorithm, the directed edges correspond to pipes as the Ford-Fulkerson algorithm, but the intuition of nodes is different from the Ford-Fulkerson algorithm. The nodes, which can be regarded as the pipe junctions, have two parameters. One is the accumulation of the fluid, and the other one is the node heights. The node heights determine how flow is pushed. We only push flow from a higher node to a lower node.

(33)

and the relabel operation. The push operation performs pushing flow excess from a node to one of its neighbors. The relabel operation can increase height of a node. We will describe the flow of push-relabel algorithm by a simple example, which is shown in Fig. 3-2. In Fig. 3-2 (a), a graph with nodes and directed edges is shown. The number at the edge represents the cost (capacity of the pipe). In the beginning, the parameters of each node are initialized as Fig. 3-2 (b). The height of the source is set to the number of the nodes in the graph. The height of other nodes is initialized to zero. In Fig. 3-2 (c), we employ the first push operation that saturate all outgoing edges from the source . The flow follows the direction of the edges. When an edge is saturated with flow along its original direction, we then change the direction of the edge. Also, the flow accumulations, , of node V1 and V2 are changed to one and two. Because the flow accumulations of node V1 and V2 cannot be pushed to the other node after the first push operation, we must increase their height and this processing is called relabel operation. The relabel operation is depicted in the Fig. 3-2 (d). After the relabel operation, we can push the flow accumulations of V1 and V2 to the lower nodes as shown in Fig. 3-2 (e). We perform push operation and

relabel operation repetitively until there is no accumulation of flow in the each node

except for the sink , and the final graph is depicted in Fig. 3-2 (f).

(34)

(35)

graph provides the minimum cut. A minimum cut separates the original graph into two parts and , such that {all nodes reachable from source } and

.

3.4 Energy Minimization using Graph Cuts

Disparity estimation using graph cuts outperforms many other optimization methods. Before explaining how to compute the disparity map using graph cuts, we introduce the general form of energy function. In addition, we will introduce a method to minimize the energy function by graph cuts, - swap [8]. Finally, because the disparity values are generally more 2 values, we need to build the graph many times, which is called multiway cuts [9].

3.4.1 The General Form of Energy Function

The general form of energy function can be written as

(3.1) where is a parameter, which controls the effect of the smooth energy term.

The data energy term, , represents the dissimilarity between the left and the right images when the disparity map is . The form of data energy term is typically

(3.2)

(36)

value of pixel , and represents the cost when the disparity value of pixel is . We can select different matching cost functions and different aggregation methods to generate the cost array that was explained in section 2.3.1 and 2.3.2. The data energy term is the summation of all costs when a disparity map is given.

The smooth energy term, , measures the extent to which is not piecewise smooth. If there are some non-smooth regions on the disparity map, we should add some penalties on total energy. The smooth data term will make the disparity map much smooth everywhere. The smooth energy term typically has the form

(3.3)

where is the set of interacting pairs of neighboring pixels, and is the an interacting function, which has many different forms. In our platform, we use the Potts model as our interacting function. The Potts model can be represented as

(3.4) where is 1 if the argument is true, and otherwise, 0, and is a penalty constant. This model encourages disparity values consisting of several regions, where pixels in the same region have equal disparity value. We also call it a piecewise constant model.

(37)

the impact of energy function. We next describe how we minimize the energy function by graph cuts in section 3.4.2.

3.4.2 The - Swap Method

First, we review some basic fact about graphs which is used to minimize the energy function of disparity estimation. A directed weighted graph

consists of a set of nodes and a set of directed edge that connect them. The nodes are usually composed of the pixel nodes and the terminal nodes. The pixel nodes and the terminal nodes respectively correspond to the pixels in the image and the disparity values which we can assign to pixels. In Fig. 3-3, we show a simple example of a image with two disparity values. The two terminals are usually called the source, s, and the sink, t. For more clearly seeing, we provide a simpler illustration in Fig. 3-4, whose pixel nodes are arranged in 1D. The set of pixels in the

1D image is , where and . The weight

information of the edges is shown in Table 3-1. The edges can be classed into two groups: t-links and n-links. The t-links (terminal links) connect each pixel node to the terminals and , which is called and , respectively. The n-links (neighbor links) connect each pair of pixels , that are neighbors ( ). The symbol of the n-links is .

(38)

s

t

p q

Fig. 3-3 An example of a directed weighted graph

(39)

Table 3-1 Weight information of the edges

edge weight for

From the Table 3-1, we know that the weight of edge is , which is the data energy term. is the cost value of pixel , which is assigned a disparity value . The edge is similarly defined. The weight of edge is , which is the smooth energy term. is an interacting function as we said previously. When the graph is constructed, we find the minimum cut of the graph by the push-relabel algorithm. We discuss the properties of a cut by a simple example in Fig. 3-5. If the cut of Fig. 3-5 (a) is the minimum cut, we will assign disparity value to pixels and . The case of Fig. 3-5 (b) is similar to (a). If the cut of Fig. 3-5 (c) is the minimum cut which includes the n-link, , we will assign disparity value and to the pixels and , respectively.

After finding the minimum cut of the graph, we can assign the disparity value to each pixel in . The method that we adjust the disparity map is called - swap. Fig. 3-6 (a) is an example of the initial disparity map. Although the disparity map is always a gray level picture, we fill the region of disparity value with red (darkest) color for illustration. After an - swap, the disparity map is modified to Fig. 3-6 (b). The disparity map in (b) is much smoother than (a) in the regions of and ,

(40)

because we minimize the energy function by graph cuts. However, a disparity map may contain more than two disparity values. We cannot simply employ the graph cuts once and, therefore, we will introduce the multiway cut algorithm to solve this problem.

Fig. 3-5 Properties of a cut on the graph

(41)

3.4.3 Multiway Cut Algorithm

Because the number of disparity values is greater than two, there are multiple terminals in the graph as shown in Fig. 3-7. The - swap method processes only two terminals at a time, and, therefore, we must use multiway cut algorithm to solve the problem. We describe the detail of multiway cut algorithm as follows.

(1) Start with an initial disparity map which may be given by the WTA method. (2) Randomize the disparity sequence, and pick up two disparity values as the

terminals of the graph. Note that only the selected pixels can be the pixel nodes in the graph.

(3) Employ the algorithm which we introduce previously to find the minimum cut. (4) Repeat step (3) for every possible pair of disparity values.

(5) Repeat the step (3) and step (4) until the energy does not change.

After the total flow of multiway cut algorithm flow is computed, we reach a sub-optimal total energy of the disparity map. Fig. 3-8 shows the flowchart of the GC algorithm. (1) corresponds of to the S1, and (2) is composed of S2 and S3. (3) is equal to S4. In addition, (4) and (5) are the inner and outer loop.

(42)

p q

k

(a)

(b)

Fig. 3-7 (a) An example of the graph with multiple terminals (b) An induced graph by a multiway cut (dotted lines indicate cut edges)

(43)

(44)

Chapter 4 The - Swap Algorithm Speed-Up

and Early Termination

4.1 Overview

Among the global optimal algorithms, graph cuts (GC) and its variations [8]-[11] generally show very good performance in their disparity estimation quality. Therefore, GC is chosen as the target algorithm for speed up. The flowchart of the original GC is shown in Fig. 3-8 which we was discussed in Chapter 3 previously. In this chapter, we propose a fast GC algorithm for disparity estimation purpose. Two accelerating techniques are suggested: one is the early termination rule, and the other prioritizes the - swap pair search order.

4.2 Early Termination of Energy Minimization Process

In this section, we examine the energy minimization process of GC to determine an early termination threshold. First, we define a terminology to be used in the following discussions. Fig. 4-1 shows the probability distribution of all possible disparity values for a test image pair after the -th outer-loop iteration. This probability distribution (sequence) forms a vector: . We measure the similarity between two vectors by their inner product and thus theta , the angle between and , is defined by (4.1).

(45)

(4.1) Fig. 4-2 shows the values E and after each outer-loop iteration on the test image ‘sawtooth’, and Fig. 4-3 shows the RMS_error and Bad_pixel of the corresponding disparity map. In the energy minimization process, GC monotonically decreases E. However, the quality metrics slightly fluctuate when E reaches its minimum. The other test image pairs show similar results. When the decrease in E is small, further iterations may not necessarily improve the quality, even though the energy level can be further lowered slightly. Therefore, we suggest an early termination mechanism to save computation. The optimization process terminates when the angle between and is smaller than a given threshold . That is, when the change between and is small, the iteration stops. 0 2 4 6 8 10 12 14 16 18 20 0 0.05 0.1 0.15 0.2 P ro b a b ili ty Disparity Pi-th

Fig. 4-1 Disparity distribution of the test image pair ‘sawtooth’ after the -th iteration

(46)

Energy v.s. Theta 0 1000000 2000000 3000000 4000000 5000000 6000000 7000000

WTA Run 0 Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7

E n er g y 0 5 10 15 20 25 30 35 40 D eg re e Energy Theta

Fig. 4-2 E and in the energy minimization process of the test image pair ‘sawtooh’

Quality Metrics 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0% 70.0%

WTA Run 0 Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 Run 7

P er ce n ta ge 0 1 2 3 4 5 6 7 N u m be r Bad_pixel RMS_error

Fig. 4-3 RMS_error and Bad_pixel in the energy minimization process of ‘sawtooh’

4.3 Prioritizing the - Swap Pair Sequence

The original GC scheme randomly selects a disparity pair from the disparity candidate set and then does the - swap for all possible disparity values. However, let and be a chosen disparity pair. If is the best disparity pair for only a few nodes, this specific - swap has a limited impact on minimizing the total energy but it consumes the computing power. Therefore, if we find an effective strategy to prioritize the - swap pairs, i.e., the better matched

(47)

pairs are tested first, then the total energy converges faster.

After each run of the outer loop, we obtain , , , and for each node. Consequently, we have the disparity probability (the number of nodes with a specific disparity divided by the number of all pixel nodes), , , and for each disparity value. Fig. 4-4 shows the disparity distribution of the test image pair ‘sawtooth’. Part (a) is the probability after the first iteration (denoted by ). Part (b) is the probability difference between the first and the final iterations (denoted by - ), which represents the probability difference needed to be adjusted by the iterative algorithm. Typically, there are only a few objects in an image; therefore, the disparity distribution is dominated by a few disparity values. Often, the dominated disparity values show up after the first couple of iterations. That is, their probabilities are higher than the other disparities. Hinted by this observation, the disparity probability distribution can be used as clues for choosing the final disparity values. In this section, we prioritize the disparity pairs according to their probability although the other attributes such as , , and may also be used for prioritization purpose. We will discuss the difference between them in the next section.

The benefits of prioritizing the - swap pair search order mainly come from the early iterations of the outer loop. With the correctly prioritized disparity

(48)

candidates, we reach the final goal much faster and thus save computation.

Fig. 4-5 shows the flow chart of our proposed fast GC, which is the combination of prioritizing the - swap pair sequence and the early termination (section 4.2) technique. The modifications to the original GC scheme are steps 2, 7, and 8. In Step 2, we prioritize the disparity pair search order based on their disparity

probabilities. 0 5 10 15 20 0 0.05 0.1 0.15 0.2 P ro b a b ili ty Disparity (a) P1-st 0 5 10 15 20 -0.02 -0.01 0 0.01 0.02 0.03 0.04 ∆ P ro b a b ili ty Disparity (b) Pfinal - P1-st

Fig. 4-4 Disparity distribution of ‘sawtooth’ after the 1st outer-loop iteration We then perform the - swap sequentially according to the priority order (probability) of disparity pairs. In step 7, we also calculate value. In the extra step 8, we check whether is larger than a given . If so, we run another iteration of the outer loop. Otherwise, we terminate the optimization process. The value of

(49)

S3: For each disparity pair (α,β)

S4: Find dInner_{=arg min E}

total(dJ) among dJ

within one α-β swap of d, EInner_=minE

total(dJ) S6: EInner_<E Inner loop N End

S1: Determine disparity dWTA _{by WTA,}

calculate EWTA_{= E}

total(dWTA),

and let d=dWTA_{, E=E}WTA

S2: Sort the disparity d according to its probability and prioritize the disparity pairs

(α,β) from high probability to low.

S5: Inner loop end

Y S7: d=dInner_, E=EInner_, calculate θ Outer Loop Y Begin S8: θ> θthreshold N

Fig. 4-5 Flow chart of our proposed fast GC

4.4 Simulation Results and Discussions

In this section, we will show the experiment environment setting and the simulation results for different criterions. The improvement of our proposed algorithm is also shown in this section. In addition, we will analyze and discuss the simulation results.

4.4.1 Experiment Environment Setting

We implement our proposed algorithm and test it on the test bed retrieved from the Middlebury stereo vision web page. Four test image pairs – ‘Map’, ‘Sawtooth’, ‘Tsukuba’, and ‘Venus’ (all with ground truth disparity maps) – are in use. Our simulation platform is a PC with Intel Core2Quad Q6600 and 4G RAM. The

(50)

performances of the original GC and our proposed fast GC are measured by six metrics: computing time, (rms_error_all), (bad_pixels_all), (bad_pixels_nonocc), (bad_pixels_textureless), and (bad_pixels_discont). Table 4-1 shows the important parameters in our implementation. The simulation results of the original GC with the same setting are close to those in the Middilebury web page.

Table 4-1 The experiment environment setting

Parameter Value Meaning

match_fn 1 Absolutely difference

match_max 1000 No truncation

match_interval 1 BT

opt_fn 4 GC

aggr_iter 0 No aggregation

opt_smoothness 20 Weight of smoothness term (λ)

opt_grad_thresh 8 Threshold for magnitude of

intensity gradient [22]

opt_grad_penalty 4 (Map, Tsubuka)

2 (Sawtooth, Venus)

Smoothness penalty factor if gradient is too small [22]

4.4.2 Simulation Results

The simulation results of prioritizing the - Swap pair sequence according to the disparity probability (the number of nodes with a specific disparity divided by the number of all nodes), , , and for each disparity value are shown in detail in this section. In addition, we also represent the simulation results with early termination in energy minimization process (ET), separately.

(51)

Table 4-2 Comparison of computing time

Method

Computing time (sec) Average Improvement

(%) Map Sawtooth Tsukuba Venus

Original 83.52 156.39 93.88 131.16 Probability 41.45 121.13 38.02 96.86 36.02 55.66 127.50 73.98 117.86 19.35 50.83 101.89 39.27 85.97 40.22 48.75 110.24 44.63 84.45 38.04 ET 38.31 65.30 75.78 71.58 46.02 Probability+ET 20.61 50.75 24.73 53.88 67.74 +ET 30.38 60.30 38.25 81.30 54.78 +ET 21.97 40.41 25.67 70.99 65.79 +ET 21.19 39.45 24.72 69.83 66.62

Improvement is where denotes the computing time of method “ ” and denotes the computing time of the original method.

Fig. 4-6 Plot of computing time

0 20 40 60 80 100 120 140 160 180 (s ec ) Map Sawtooth Tsukuba Venus

(52)

Table 4-3 Comparison of rms_error_all

Method Average

Original 4.10 1.48 1.30 1.48 Probability 4.11 1.49 1.28 1.45 -0.75 3.85 1.49 1.28 1.45 -7.25 4.02 1.48 1.28 1.45 -3.25 3.85 1.47 1.28 1.49 -6.75 ET 4.05 1.49 1.30 1.39 -3.25 Probability+ET 4.12 1.49 1.28 1.45 -0.50 +ET 3.85 1.48 1.28 1.44 -7.75 +ET 4.01 1.48 1.28 1.45 -3.50 +ET 3.85 1.43 1.28 1.49 -7.75

where denotes the RMS error of method “ ” and denotes the RMS error of the original method.

Fig. 4-7 Plot of rms_error_all

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

R

(53)

Table 4-4 Comparison of bad_pixels_all

Method (%) Average

Original 5.45 3.94 4.16 3.50 Probability 5.53 3.93 4.20 4.50 0.28 5.48 4.03 4.18 3.43 0.02 5.48 3.91 4.12 3.62 0.02 5.40 3.98 4.31 4.41 0.26 ET 5.43 3.96 4.16 3.50 0.00 Probability+ET 5.47 3.96 4.20 4.50 0.27 +ET 5.47 4.03 4.18 3.43 0.02 +ET 5.47 3.94 4.12 3.62 0.03 +ET 5.40 4.01 4.31 4.40 0.27

where denotes the percentage of bad pixels of method “ ” and denotes the percentage of bad pixels of the original method.

Fig. 4-8 Plot of bad_pixels_all

0 1 2 3 4 5 6

B

(54)

Table 4-5 Comparison of bad_pixels_nonocc

Method (%) Average

Original 0.38 1.34 2.00 1.87 Probability 0.41 1.34 2.03 2.81 0.250 0.35 1.42 2.03 1.77 -0.005 0.38 1.34 1.96 1.97 0.015 0.32 1.38 2.16 2.77 0.260 ET 0.38 1.36 2.00 1.85 0.000 Probability+ET 0.40 1.36 2.04 2.82 0.258 +ET 0.35 1.43 2.02 1.77 -0.005 +ET 0.38 1.38 1.96 1.97 0.025 +ET 0.33 1.41 2.15 2.77 0.268

where denotes the percentage of non-occlusion bad pixels of method “ ” and denotes the percentage of non-occlusion bad pixels of the original method.

Table 4-6 Comparison of bad_pixels_textureless

Method (%) Average

Original 0.00 0.24 1.09 2.76 Probability 0.00 0.26 1.16 5.04 0.79 0.00 0.26 1.15 2.63 -0.02 0.00 0.25 1.11 3.83 0.37 0.00 0.30 1.40 5.14 0.92 ET 0.00 0.24 1.09 2.50 -0.06 Probability+ET 0.00 0.26 1.16 5.04 0.79 +ET 0.00 0.26 1.15 2.63 -0.02 +ET 0.00 0.28 1.11 3.83 0.38 +ET 0.00 0.35 1.39 5.14 0.93

where denotes the percentage of texureless bad pixels of method “ ” and denotes the percentage of texureless bad pixels of the original method.

(55)

Table 4-7 Comparison of bad_pixels_discont

Method (%) Average

Original 3.82 6.23 9.87 7.22 Probability 4.55 6.55 10.15 7.28 0.35 3.76 6.42 10.03 6.80 -0.38 3.85 6.48 9.77 6.63 -0.07 3.62 6.53 10.79 6.76 0.24 ET 3.85 6.28 9.87 6.62 -0.27 Probability+ET 4.49 6.54 10.15 7.28 0.46 +ET 3.76 6.43 10.00 6.66 -0.40 +ET 3.94 6.65 9.76 6.57 0.02 +ET 3.79 6.49 10.79 6.80 0.24

where denotes the percentage of non-occlusion bad pixels of method “ ” and denotes the percentage of non-occlusion bad pixels of the original method.

(56)

Ground truth Original

Probability+ET +ET

Fig. 4-9 Disparity maps of ‘Map’

Probability+ET +ET

(57)

Probability+ET +ET

Fig. 4-11 Disparity maps of ‘Tsukuba’

Probability+ET +ET

(58)

4.4.3 Analysis and Discussions

Table 4-2 and Fig. 4-6 show the computing time of the original GC and the proposed methods. Improvement is calculated from one minus their ratio. In the methods without early termination, the improvement of prioritizing the - swap pair according to is the best. However, if the early termination is employed, prioritizing the - swap pair according to probability has the least computing time. In addition, we find that the improvement due to early termination only can reach 46%.

Table 4-3 and Fig. 4-7 show the rms_error_all of the original GC and the proposed methods. The improvement is calculated from their average performance on 4 image pairs. The rms_error_all slightly decreases in our methods. Prioritization according to and can improve rms_error_all more than others with or without the early termination. Table 4-4 and Fig. 4-8 show the bad_pixels_all of the original GC and the proposed methods. Prioritization according to probability and

are worse than the others. The degradation is dominated by the ‘Venus’.

However, its performance is still about the same as that of the original GC.

Table 4-5, Table 4-6, and Table 4-7 show that different methods have different performance for different types of images. For example, prioritization according to is the best performer in the texureless region. In conclusion, if we can tolerate

(59)

the slight degradation of quality, we choose the probability prioritized method which saves most computing time. On the other hand, if we allow a slightly higher computing time to exchange for a better quality, we can choose the prioritized

method. In addition, the prioritized method is a good choice, because it decreases a lot of computing time and only slightly degrades the quality.

Fig. 4-9, Fig. 4-10, Fig. 4-11, and Fig. 4-12 show the disparity maps of the ground truth, original GC, probability+ET, and +ET. The disparity maps generated by the proposed methods generally are very close to the disparity map produced by the original GC. After examing them closely, we find out that the disparity map of +ET is a little bit better than that of probability+ET.

(60)

Chapter 5 Multi-Resolution Graph Cuts and

Disparity

Estimation

for

Multi-Camera

Array

5.1 Overview

In disparity estimation, the graph cuts and the belief propagation algorithms provide better disparity map quality. Unfortunately, their computation time is very high. In this chapter, we use multi-resolution graph cuts to reduce the computing time. Then, we estimate the disparity maps when the multi-camera array is in use.

5.2 Disparity Estimation using Multi-Resolution Graph Cuts

In section 3.3, we describe the push-relabel algorithm which can solve the max-flow/min-cut problem. The worst-case running time for this algorithm is , where is the number of nodes and is the number of edges in the graph. Because and increase with the image size, the running time of the graph cuts algorithm greatly increases with the image size. In addition, the - swap method constructs a graph of two terminals (disparity values) at a time. If the disparity range is , we construct graph in total (all combinations). Therefore, the running time also greatly increases with the disparity range when we

(61)

use the - swap method.

We can employ the multi-resolution graph cuts (MRGC) technique for reducing the computation time. Fig. 5-1 shows the flowchart of MRGC. We first use the image down-sampling technique to generate the low-resolution images. The right-side path in the Fig. 5-1 is same as the original GC method except the image size and the disparity range. If the disparity range of the original GC is in the original resolution, the disparity range is in the low-resolution image. After the low-resolution disparity map is obtained, we come back to the original resolution image size. We up-sample the disparity map. Then, it becomes the initial disparity map for the neighborhood graph cuts.

(62)

The main components that are different from the original GC method are the image down-sampling, disparity map up-sampling and scaling, and neighborhood graph cuts operations. We describe the detail in the following sub-sections.

5.2.1 Image Down-Sampling

In this section, we describe two down-sampling methods that we use in MRGC. Fig. 5-2 shows an example of a simple down-sampling method. The sampling factor is 2 for the width and the height. This method simply skips every other pixel in one-dimension. Because of lacking prefitting, the low-resolution images after the down-sampling may suffer the aliasing effect. Therefore, we attempt another method to down-sample the original images. We use a sliding window whose coefficient is depicted below for pre-fitting.

(5.1) 4 5 5 4 5 8 8 9 3 4 5 5 6 8 9 8 3 4 5 5 6 8 8 9 3 3 8 9 5 5 9 8 4 3 8 7 7 3 5 5 4 3 8 6 7 3 5 6 5 4 8 8 8 4 3 5 3 3 4 5 5 3 4 4 4 5 5 8 3 5 6 8 4 8 7 5 5 8 8 3 downsample

(63)

Because the coefficients of the window are all integer and the sum of the coefficients is 16, we can shift right four bits in calculation instead of using division. This filter does not increase much computing time comparing to the simplest method. We will compare the two methods in section 5.4 by simulation.

5.2.2 Disparity Map Up-sampling and Scaling

In this section, we also describe two methods to up-sample the low-resolution disparity map. Fig. 5-3 shows a simple up-sampling method that duplicates the pixel value to its neighbors directly and multiplies the disparity value by 2. Thus, the derived disparity map becomes the initial disparity map for the neighborhood graph cuts. Because the disparity maps produced by the simple method may produce blocky images, we can reduce artifacts by performing some types of linear or bilinear interpolation in up-sampling. However, the interpolation process increases the computation time and its quality improvement on the disparity map it uncertain.

8 8 10 10 10 10 16 16 8 8 10 10 10 10 16 16 6 6 10 10 12 12 16 16 6 6 10 10 12 12 16 16 8 8 16 16 14 14 10 10 8 8 16 16 14 14 10 10 10 10 16 16 16 16 6 6 10 10 16 16 16 16 6 6 4 5 5 8 3 5 6 8 4 8 7 5 5 8 8 3 Upsampling and Scaling

(64)

We attempt to employ the up-sampling method of H.264 [23], which is shown in Fig. 5-4. The pixel can be obtained from pixels , , , , , and by the formula below.

(5.2) We can use pixels , , , , , and , to interpolate the pixel similarly.

(5.3) The coefficients of the interpolation filter are , which mimic the sinc function. After the up-sampling interpolation process, we multiply the disparity values by 2. In section 5.4, we will compare the two methods based on the simulation results.

(65)

5.2.3 Neighborhood Graph Cuts

In the original graph cuts, the number of graphs needed for constructing - swap is the total combinations of disparity pairs selected from the disparity range. The neighborhood graph cuts method reduces the number of constructing graph. Unlike the original graph cuts, we use the disparity map obtained from the up-sampling and scaling process as the initial disparity map. We assume that the disparity value of each pixel only differ to its neighborhood disparity values by 1. Therefore, we try to reduce the number of combinations of disparity pairs in - swap to reduce the computing time. Fig. 5-5 shows the disparity pair combination of neighborhood graph cuts. The gray nodes are the disparity values obtained from the scaling. The arrow shows the value that the disparity value can change to. That is, we select two disparity values to do the - swap.

Fig. 5-5 The disparity pair candidates in neighborhood graph cuts (a) (b)

(66)

range of neighborhood graph cuts is . In Fig. 5-5(b), the search range is and their combination is nearly two times more than . Note that we cannot do - swap with the same value. In section 5.4, we will compare the performance of these two methods and show the computing time saved by using MRGC.

5.3 Disparity Estimation in Multi-Camera Array

In this section, we propose a method for disparity estimation for multi-camera picture. We pick up “sawtooth” and “venus” as our test data for the multi-camera experiment.. The two test data sets both include nine images captured by nine cameras (Fig. 5-6). We call them im0, im1, …, and im8. These images are captured by cam0, cam1, …, and cam8, respectively.

Fig. 5-6 Multi-camera array

In the original method, we compute im2’s disparity map which is relative to im6. Now, im4 is added into the proposed method. Fig. 5-7 shows the flowchart of our GC

(67)

algorithm for multi-camera pictures. First, the im4’s disparity map relative to im6 is computed by using the graph cuts algorithm. We use this to predict the disparity map of im2 relative to im6, because the optical geometry tell us that the im2’s disparity map relative to im6 is a shifted and scaled version of the im4’s disparity map relative to im6. The results of scaling and shifting are shown in Fig. 5-8. This predicted disparity map is need as the initial disparity map and refine it by the graph cuts algorithm. In section 5.4, we will show the simulation results. The improvement of this method is not significant.

Fig. 5-7 Flowchart of GC for multi-camera pictures

(68)

5.4 Simulation Results and Discussions

In this section, we will show the simulation results of MRGC and multi-camera pictures. Thus, we will discuss the possible causes leading to the simulation results. The experiment environment setting is the same as section 4.4. In MRGC, the scaling factor is 2.

5.4.1 Multi-Resolution Graph Cuts

We first explain the symbols to appear in the following tables. MRGC( ) indicates the multi-resolution graph cuts whose search range of neighborhood graph cuts (NGC) is . Similarly, the search range of NGC of MRGC( ) is . In MRGCD, the down-sampling method indicates the low-pass filter describe in formula (5.1). Likewise, MRGCU denotes that its up-sampling method uses the H.264 up-sampling filter. In MRGCDU, both down-sampling and up-sampling processes adopt the before mentioned filters.

Table 5-1 shows the computing time improvement by MRGC ranges from 81% to 92%. MRGC( ) runs a little longer than MRGC( ), because the search range is wider. Table 5-2 and Table 5-3 show the image quality comparison of different methods. Although the computation time of MRGC( ) is slightly larger than MRGC( ), its quality is much better. In addition, the down-sampling method of (5.1) is better than the simple sample-skip method. However, if we replace the simple

(69)

disparity map duplication method by the H.264 interpolation the method, the computing time gets higher and the bad_pixels_all becomes worse. This may due to the fact that the discontinuous region is critical for initial disparity map, and the H.264 up-sampling method blurs the initial disparity map.

Fig. 5-12, Fig. 5-13, Fig. 5-14, and Fig. 5-15 show the disparity maps of different methods. Obviously, the disparity map looks much smoother by down-sample the original image by the formula (5.1). In addition, the disparity map of MRGC( ) is much better than MRGC( ). The MRGC( ) produces a disparity may close to the original GC method.

Hierarchical graph cuts [24] is one of the few fast graph cuts algorithms found in the literature. According to [24], the computing timing of the hierarchical graph cuts is about 25% of the original GC on the test image “Tsukuba”. However, MRGC is faster than the hierarchical graph cuts. Our method takes about 16% of the computing time of the original GC. In addition, the quality of the hierarchical graph cuts is not discussed in the paper. We are not sure about the quality degradation of this method.

使用於立體視差估算之快速圖形切割演算法

國 立 交 通 大 學

電子工程學系 電子研究所

碩 士 論 文

使用於立體視差估算之快速圖形切割

演算法

Fast Graph Cuts Algorithm for Disparity

Estimation

研 究 生： 周 正 偉

指導教授： 杭 學 鳴

使用於立體視差估算之快速圖形切割

演算法

Fast Graph Cuts Algorithm for Disparity Estimation

研 究 生：周 正 偉 Student: Cheng-Wei Chou

指導教授：杭 學 鳴 博士 Advisor: Dr. Hsueh-Ming Hang

國 立 交 通 大 學

電子工程學系 電子研究所

碩 士 論 文

中華民國九十九年六月

使用於立體視差估算之快速圖形切割

演算法

研究生: 周正偉

指導教授: 杭學鳴 博士

國立交通大學

電子工程學系 電子研究所碩士班

摘要

摘要

摘要

摘要

Fast Graph Cuts Algorithm for Disparity

Estimation

Student: Cheng-Wei Chou

Advisor: Dr. Hsueh-Ming Hang

Department of Electronic Engineering &

Institute of Electronics

National Chiao Tung University

Abstract

誌謝

誌謝

誌謝

誌謝

Table of Contents

List of Figures

List of Tables

Chapter 1 Introduction

1.1 Background

1.2 Motivation and Contributions

1.3 Organization of the Thesis

Chapter 2 Introduction of Computational

Stereo

2.1 Overview

2.2 Epipolar Geometry

2.3 The General Structure of Matching Algorithm

2.4 A Taxonomy Evaluation

Chapter 3 Energy Minimization by Graph Cuts

3.1 Overview

3.2 Max-Flow and Min-Cut Problem

3.3 Push-Relabel Algorithm

3.4 Energy Minimization using Graph Cuts

s

t

k

k

Chapter 4 The - Swap Algorithm Speed-Up

and Early Termination

4.1 Overview

4.2 Early Termination of Energy Minimization Process

4.3 Prioritizing the - Swap Pair Sequence

4.4 Simulation Results and Discussions

R

B

Chapter 5 Multi-Resolution Graph Cuts and

Disparity

Estimation

for

Multi-Camera

Array

5.1 Overview

5.2 Disparity Estimation using Multi-Resolution Graph Cuts

5.3 Disparity Estimation in Multi-Camera Array

國立交通大學

電子工程學系電子研究所

碩士論文

研究生：周正偉

指導教授：杭學鳴

研究生：周正偉 Student: Cheng-Wei Chou

指導教授：杭學鳴博士 Advisor: Dr. Hsueh-Ming Hang

國立交通大學

電子工程學系電子研究所

碩士論文

指導教授: 杭學鳴博士

電子工程學系電子研究所碩士班