Observations - 針對 MPEG 自由視角視訊之合成品質導向深度圖優化

Equation (3.4) provides a non-stationary model for the expected per-pixel synthesis distortion, which suggests that the depth error for dierent pixels should have dierent contributions to the overall synthesis distortions. From the equation, the distortion caused byqs is determined by several factors measured atp: the depth-error variance, the intensity variation, the (ground-truth) depth value, as well as the position of the virtual camera. Further insight into the combined eects of these factors is gained by looking at Figure 3.3, which displays the ratio of ]s to q(p) as a function of H{s|]s> ]t}, under various settings of ]s, ]t, and p⁽²⁾j (p) simulating smoothly- or rapidly-changing depth/intensity elds. In the experiment,q(p) was varied to identify the highest level of error variance at which the specied distortion is achieved. The result is then used to compute ]s@q(p). Intuitively, the ratio, which we call depth-error sensitivity, characterizes how sensitive a pixel is to its depth depth-error in terms of the extent of synthesis distortions. A higher ratio (sensitivity) implies that a small error in depth can lead to a considerable distortion.

From the gure, several important observations can be made:

1. Compare the curves produced with dierent settings of p⁽²⁾j (p)= The larger the value of p⁽²⁾j (p), the more sensitive the pixel p is to its depth error; namely, when depth errors happen in areas with vertical edges or ne texture details, their eects on synthesis quality are more apparent. This observation is also corroborated by [7].

2. Compare parts (a)(c)(e) with parts (b)(d)(f). When a pixel corresponds to a farther clipping plane, it exhibits a lower depth-error sensitivity. In this case, the pixel has a larger depth value ]s and according to Equation (2.1), the resulting geometry distortion is less signicant.

3. Compare part (e) with parts (a)(c) (or (f) with (b)(d)). When a pixel p is ill-warped toq⁰, the resulting synthesis error is less observable if]t is much greater than ]s (and hence b]s). The result can be explained using the example shown in Figure 3.4, where q1 and q2 denote respectively the inverse projections of q⁰ for the two extreme cases: ]t1 À b]s and]t2¿ b]s= Since ]t1À b]s ]s À ]t2, the artifact is more noticeable when a depth error causes warping to substitute a background pixel for a foreground pixel, which explains the less signicant change

w sensitive a pix

s. A sensit

der

important observations can

rod etti

more sensitive the pixel in areas with v

ervations

mportant observations ca

du se

e sensitive the pix with dieren

Chapter 3. Per-Pixel Synthesis Distortion Model

Figure 3.4: A geometrical interpretation of the eect of]t on depth-error sensitivity.

in intensity when]tÀ ]s.

4. Observe the reciprocal relation between _q²(p)@]_s² and f² in Equation (3.4). It suggests that when a pixel p is warped to a virtual view that is farther away from the reference view, it is more sensitive to depth errors.

These observations remain valid for other camera congurations, except that the ef-fects of the intensity variation and camera arrangement must jointly be considered by evaluatingH{(OLU(p) · c)²}.

nd camera arrangemcamera arran

CHAPTER 4 Synthesis-Quality-Oriented Depth Renement Scheme

The framework of MPEG FTV [9] views the transmitted depth images as determin-istically specifying the depth information for the reference images. The compression eects of depth images were neglected during the rendering of virtual views. As seen from the analysis in §3, depth errors can cause disturbing synthesis artifacts, especially at areas with sharp edges or ne texture details. To tackle the problem, we propose to regard both the received view and depth images as sources of information about the ground-truth depth of the scene, and provide ways to detect and rene unreliable depth values.

4.1 System Architecture

To allow for an easier understanding of our algorithm, Figure 4.1 depicts the system block diagram with a highlight on the data communicated between functional blocks.

As shown, for an economic use of network bandwidth, both reference images {L1> L2}

eme

9] views the tra

me

Chapter 4. Synthesis-Quality-Oriented Depth Renement Scheme

Figure 4.1: System Block Diagram.

and their respective per-pixel depth information {G1> G2} are compressed prior to transmission. These data are decoded and reconstructed at the receiver side before they are used for the creation of virtual views. The "prime" symbols in the gure dierentiate the coded view and depth images from their original sources.

Recognizing that depth-image compression may give rise to depth errors, we intro-duce a depth renement mechanism at the receiver side. The objective is to improve synthesis quality by rening the depth values for those pixels (which we call unreliable pixels) being highly sensitive to depth errors. The process consists of two sequen-tially operated steps: (1) the detection of unreliable pixels and (2) the renement of their depth values, both need to access the coded view and depth images. To make their performance robust against compression eects, additional control parameters are transmitted to the receiver as the side information, with their settings being determined at the sender side by evaluating the detection and renement quality as perceived by the receiver over the range of all possible choices. The details are elaborated in the subsequent sections.

4.2 Reliability Detection

The detection process at the receiver side aims to discover unreliable pixels—i.e., those that are highly sensitive to depth errors and hence require higher delity for their depth

th informati decoded and reconstr ion of virtual views. The w and depth images from the

image compression may give hanism at the receiver si

depth values for th ges from pression m

m m ec ws.

coded and recon n of virtual views. Th nd depth images from t age compression may g nism at the receiver

h l

mpression m t th i

mages comp

d depth imag om

vie ual virtu

m a

Chapter 4. Synthesis-Quality-Oriented Depth Renement Scheme

values in order to minimize rendering errors. From the theoretical analysis in §3, a pixel is likely to be unreliable if it locates in a region with large intensity variation, or if it represents a pixel in a near clipping plane. Although both facts can jointly be utilized to form detection criteria, we consider only the use of intensity variation because view images are generally better compressed than their depth representations, making the intensity information more reliable for decision-making.

To quantify intensity variation, we adopt the Gaussian derivative operator to com-pute gradient for all the pixels in view images. A pixelp is considered to be unreliable and its depth value deserves rening if the magnitudekOL_U⁰ (p)k of its gradient exceeds a given levelWG1. According to Observation #1 in §3, such a pixel is highly sensitive to depth errors, hence requiring higher precision for its depth value. Apparently, the value ofWG plays a pivotal role in determining the detection accuracy. With non-stationary signal statistics, we propose to adapt WG on a frame-by-frame basis. This is realized by transmitting its value as the frame-level side information.

In determining the value of WG for a particular frame, we wish to strike a good balance between the hit and false-alarm rates. The best setting of WG, denoted by W_G, should have the subset of pixels S(W_G) = {p : kOL_U⁰ (p)k A W_G} contain as many unreliable pixels as possible while keeping the number of reliable ones to be minimal.

To ndW_G, we rst associate each plausible choice of WG and the corresponding set of pixels S(WG) with a matching score that weights the hit rate against the false-alarm rate:

Then we choose, among all possible choices, the one that yields the highest matching score, i.e., W_G = arg max_W_GM(WG). The approach can be interpreted as to evaluate, at the sender side, the detection quality as perceived by the receiver.

In the course of computing the matching score, it is necessary to decide whether a

1With parallel camera conguration, only the { component of the gradient is computed and

W L⁰

frame-level side inf

of ar fr

nd false-alarm rates. The b

of OL

while keeping the number ach plausible choice o

h t

false-alarm rates. The

p kO

hile keeping the numb plausible choi

Chapter 4. Synthesis-Quality-Oriented Depth Renement Scheme

hit or false alarm occurs. This is accomplished by evaluating the per-pixel synthesis distortion s at the sender side with L1 and L2 (or in the reverse order) being used in place of LU and LW, respectively (cf. Equation (3.1)). Specically, if s is greater than or equal to a threshold , indicating that the depth associated with the pixel p may be unreliable, a hit is identied; otherwise, a false alarm is signaled. Ideally, the

should be set to zero according to the Lambertian condition; however, in practice a non-zero value was used to compensate camera noises and illumination dierence between view images. The settings of and that yield the best synthesis quality (in terms of PSNR) are searched exhaustively at the sender side. Note that they need not be transmitted to the receiver.

4.3 Depth Renement

After we discover all the unreliable pixels, our next step is to rene their depth values.

Because depth renement is performed by the receiver, its operation must be made computationally simple and ecient. For this reason, we adopt a candidate-based disparity estimation scheme to derive depth from the received view images. As in most block-based algorithms, a constant disparity is searched for each block of pixels (of size 7 ×7), centered on an unreliable pixel p, by minimizing the error between the two view images after disparity compensation. However, unlike their techniques, which usually require examining a large number of disparities, ours restricts the search to only those disparities that correspond to an integer depth value in the interval of [ b]sUs> b]s+Us].

On one hand, this constraint is an expediency out of complexity considerations, and on the other hand, it prevents the simple block-based search from getting an improper disparity.

Although reducing the number of search candidates helps to simplify the disparity search, the issues are how to determine a proper value of Us for each unreliable pixel and how to signal the information eciently. As described previously, the value ofUs

determines the maximum modication of b]s that can be caused by depth renement—

i.e., it controls the strength of renement. It was found in our analysis that the depth error sensitivity of a pixel is related to its ground-truth depth value, implying that the adaptation of Us should refer to the value of b]s (which is an approximation of ]s).

ble pixels, our next performed by the receiv

d e son

to derive depth from the rec nstant disparity is searched

e pixel inimizi However

his rea pth from h

the r is re pixels, our n erformed by the rece

e as

o derive depth from the tant disparity is search

ixelp, by minim s r

sparity is sease this depth

t di is

cient. For t ve dep

Chapter 4. Synthesis-Quality-Oriented Depth Renement Scheme

(a) (b) (c)

(d) (e) (f)

Figure 4.2: A sample result of the proposed depth renement algorithm: (a)(d) the original depth image, (b)(e) the decoded depth image, and (c)(f) the rened depth image.

For a trade-o between quality and overhead, we divide the set S(W_G) into Q disjoint subsets vl(W_G)> 1 l Q, each of which is assigned a renement search range ul. A uniform quantizer that operates on the received depth b]s is used to categorize the unreliable pixels in S(W_G) into one of the Q subsets. After that, the best settings of {ul}^Q_l=1 are searched exhaustively at the sender side and transmitted to the receiver as the side information.

Figure 4.2 shows a sample result of our renement process. Observe that depth compression introduces blocking artifacts on the decoded depth image (see parts (b) and (e) of Figure 4.2). With depth renement, we can remove the artifacts largely (see parts (c) and (f) of Figure 4.2); note the clarity of object boundaries that simply are not visible in the decoded depth image. Interestingly, the renement can even recover some details that are removed by the enforcement of depth smoothing (compare parts (a)(d) and (c)(f) of Figure 4.2).

decoded depth i

lity and overhead, we divide each of which is assigned

rates on the received de e of the Q sub

ead, we d hi h i

di d oded dep

y and overhead, we divi ach of which is assign

es on the received which is as

rhead

of w s

and overhe f

CHAPTER 5 Experiments

Extensive simulations were carried out to demonstrate the performance of the proposed scheme, and the results were compared with that of [7] and [8]. All the renement schemes were implemented with the MPEG committee software VSRS 2.1 [10]. All experiments used DERS 2.0 [10] to generate depth images and JMVC 3.0.1 [11] to encode multi-view videos and their depth images. The average PSNR of synthesized images was computed based on the rst 100 frames of each test sequence. Particularly, in implementing the method described in [7], we employed the magnitude of synthesis errors rather than manually generated edge maps to distinguish pixels of dierent categories. For a fair comparison, all the threshold values used in [7] and [8] were determined by optimizing the quality of synthesized images. Table 5.1 and Table 5.2 detail the depth estimation settings and the encoder settings, respectively.

Figure 5.1, 5.2 and 5.3 compares the PSNR of various schemes when the depth QP is varied from 22 to 44. The curves associated with MPEG FTV were produced without depth renement. To see the eects of reference quality, Figure 5.1 show the results generated utilizing high-quality references (QP=22), whereas Figure 5.2 are their low-quality counterparts (QP=31). It can be seen that all three schemes

carried out to demonstrate th e compared with that of

the MPEG comm to demon ried out to demonstrate

ompared with that MPEG t to demonst

d i h h d out t

par

Chapter 5. Experiments

Table 5.1: Depth Estimation Settings. Column (a) to (c) represents Smoothing Coecient, Precision and Search Level, respectively.

SL-SR NL-NR SearchRng Min-Max

DisparityRng

Min-Max (a) (b) (c)

Lovebird1 6-7 5-8 4-90 1-110 4 4 2

Newspaper 4-5 3-6 26-88 20-90 4 2 2

Alt Moabit 9-8 10-7 1-33 1-32 1 2 2

Book Arrival 9-8 10-7 30-70 30-70 2 2 2

Door Flowers 9-8 10-7 12-38 10-0 2 4 2

Leaving Laptop 9-8 10-7 15-33 15-33 2 4 4

Dog 39-41 38-42 1-20 0-20 4 4 1

Pantomime 39-40 38-41 0-20 0-20 2 4 4

Table 5.2: Encoder Settings Reference Frame 2

Intra Period 15

CABAC on

8x8 Transform on

BasisQP 22, 25, 28, 31, 35, 38, 41, 44 Inter-view Prediction on

Search Mode 4 (Fast Search) Motion Search Range ±32

outperform MPEG FTV in all test sequences, and as expected, the improvement is the greatest when depth images are coarsely quantized. Moreover, ours has the highest gain of all the schemes—an average PSNR improvement of 1.2dB over MPEG-FTV.

The results are consistent with dierent test conditions.

Figure 5.4, 5.5, 5.6 further compare the subjective quality of synthesized images.

Part (a) illustrates what can happen if incorrect depth information is used for view synthesis. Parts (b) through (d) show the results obtained by correcting depth with one of the three schemes just described (i.e., [7], [8], and ours). As can be seen, "ghost eects" appear around object boundaries if the depth is not rened; in comparison, the visual results with depth renement are considerably improved. Our scheme even produces a result that is very close in appearance to the ground-truth view image. The reason behind the superior performance can be explained with Figure 5.7, which makes visible the unreliable pixels detected by the three schemes. As expected, our scheme tends to correct more depth pixels locating in areas with ne texture details or vertical

Chapter 5. Experiments

Figure 5.1: PSNR of synthesized images as a function of the depth and reference QP.

The reference view images are coded with QP=22.

38 4 22 24

Chapter 5. Experiments

Figure 5.2: PSNR of synthesized images as a function of the depth and reference QP.

The reference view images are coded with QP=31.

38 4 22 24

Chapter 5. Experiments

Figure 5.3: PSNR of synthesized images as a function of the depth and reference QP.

The reference view images are coded with QP=44.

38 4 22 24

Chapter 5. Experiments

(a)

(b)

(c)

(d)

Figure 5.4: Subjective quality comparison of synthesized images: (a) MPEG FTV (without depth renement), (b) Tanimoto [7], (c) Sung [8] and (d) the proposed scheme.

The depth QP of Door Flowers sequence is set to 44.

b) (

Chapter 5. Experiments

(a)

(b)

(c)

(d)

Figure 5.5: Subjective quality comparison of synthesized images: (a) MPEG FTV (without depth renement), (b) Tanimoto [7], (c) Sung [8] and (d) the proposed scheme.

The depth QP of Newspaper sequence is set to 44.

b) (

Chapter 5. Experiments

(a)

(b)

(c)

(d)

Figure 5.6: Subjective quality comparison of synthesized images: (a) MPEG FTV (without depth renement), (b) Tanimoto [7], (c) Sung [8] and (d) the proposed scheme.

The depth QP of Dog sequence is set to 44.

(b))

Chapter 5. Experiments

(a) (b) (c)

Figure 5.7: Pixels whose depth values are judged unreliable: (a) Tanimoto [7] (cate-gory 2), (b) Sung [8] and (c) the proposed scheme. Top-to-down rows are Door Flowers, Newspaper and Dog sequences, respectively.

edges—namely, those that will crucially aect synthesis quality.

depth values are judged unre the proposed scheme. Top-t

es, res

s are judg d pth values are judged un

e proposed scheme. To resp

osed scheme.

i l me b) values a

opo em

(b) alues

(b)

ch ch pec o e

CHAPTER 6 Conclusion

To alleviate the coding eects of depth images, we proposed in this thesis a synthesis-quality-oriented depth renement scheme. The approach is characterized by the unique consideration of attempting to rene only those depth pixels that are likely to cause noticeable synthesis artifacts. In the course, we developed an analytical model to estab-lish criteria for reliability detection and to form guidelines for depth renement. Since both operate on the decoded information, additional side information is transmitted to make them robust against compression eects. Experimental results show that our scheme has the highest PSNR gain of all the state-of-the-art methods. It also produces a result that is visually similar to the ground-truth image.

This work is still in its early stage. Both detection and renement schemes have not fully utilized all the factors suggested by the per-pixel synthesis distortion model.

Further improvements can be expected. Possible extensions could include more so-phisticated disparity search, time-space consistency and signal restoration techniques.

Besides, the analytical model can nd its application in developing depth compression algorithms.

cts of depth images, we prop ment scheme. The approa

ne only those images

of depth images, we pr t scheme. The app

l t th images, w

Th deptht g

Bibliography

[1] C. Fehn, R. Barre, and R. S. Pastoor, “Interactive 3-DTV: Concepts and Key Technologies,” Proceedings of the IEEE, vol. 94, pp. 524—538, March 2006.

[2] C. Fehn, “A 3D-TV Approach Using Depth-Image-Based Rendering (DIBR),”

Proceedings of Visualization, Imaging, and Image Processing, September 2003.

[3] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-Quality Video View Interpolation Using a Layered Representation,” ACM Trans-actions on Graphics, vol. 23, pp. 600—608, August 2004.

[4] A. Smolic, K. Muller, K. Dix, P. Merkle, P. Kau, and T. Wiegand, “Intermediate View Interpolation based on Multiview Video plus Depth for Advanced 3D Video Systems,” IEEE Int’l Conf. on Image Processing, October 2008.

[5] E. Cooke, P. Kau, and T. Sikora, “Multi-view Synthesis: A Novel View Creation Approach for Free Viewpoint Video,” Signal Processing: Image Communication, vol. 21, pp. 476—492, July 2006.

[6] P. Merkle, A. Smolic, K. Muller, and T. Wiegand, “Multi-view Video plus Depth Representation and Coding,” IEEE Int’l Conf. on Image Processing, October 2007.

d R. S. Pastoor, “Interacti

ing pp

Approach Using Depth-Ima on, Imaging, and Imag

E, vol. 949494

“Inte R. S. Pastoor, “Intera

gs ,

proach Using Depth-Im Imaging, and I

Using Dept n

ep EEE v

ach D

of the IEEE or,

g g stoo S. Pas

ac p

BIBLIOGRAPHY

[7] M. Tanimoto, T. Fujii, M. P. Tehrani, M. Wildeboer, and H. Furihata,

“Error Cancellation in Free-viewpoint Image Generation for FTV,” ISO/IEC JTC1/SC29/WG11, MPEG09/M16607, April 2009.

[8] J. Sung, Y. J. Jeon, J. H. Lim, and B. M. Jeon, “Improving View Synthe-sis Results based on Depth Quality Measure,” ISO/IEC JTC1/SC29/WG11, MPEG09/M16417, April 2009.

[9] “Applications and Requirements on 3D Video Coding,” ISO/IEC JTC1/SC29/WG11, MPEG09/N10570, April 2009.

[10] M. Tanimoto, T. Fujii, and K. Suzuki, “Reference Software of Depth Es-timation and View Synthesis for FTV/3DV,” ISO/IEC JTC1/SC29/WG11, MPEG08/M15836, October 2008.

[11] “Text of ISO/IEC 14496-10:2008/FDAM 1 Multiview Video Coding,” ISO/IEC JTC1/SC29/WG11, MPEG09/N9978, July 2008.

0:2008/FDAM 1 M

PEG0 July 2008

08/FDAM G099/N999788, JJ 20

在文檔中針對 MPEG 自由視角視訊之合成品質導向深度圖優化 (頁 23-41)