Depth Refinement - A Synthesis-Quality-Oriented Depth Refinement Scheme for MPEG Free Viewpoint

A Synthesis-Quality-Oriented Depth Refinement Scheme for MPEG Free Viewpoint Television (FTV)

4.3 Depth Refinement

After we discover all the unreliable pixels, our next step is to refine their depth values. Because depth re-finement is performed by the receiver, its operation must be made computationally simple and eﬃcient.

For this reason, we adopt a candidate-based disparity estimation scheme to derive depth from the received view images. As in most block-based algorithms, a con-stant disparity is searched for each block of pixels (of size 7 × 7), centered on an unreliable pixel p, by min-imizing the error between the two view images after disparity compensation. However, unlike their tech-niques, which usually require examining a large num-ber of disparities, ours restricts the search to only those disparities that correspond to an integer depth value in the interval of [ b_−  b_+ _]. On one hand, this constraint is an expediency out of complexity consid-erations, and on the other hand, it prevents the simple block-based search from getting an improper disparity.

Although reducing the number of search candidates helps to simplify the disparity search, the issues are how to determine a proper value of _for each unreli-able pixel and how to signal the information eﬃciently.

As described previously, the value of _ determines the maximum modification of b that can be caused by depth refinement—i.e., it controls the strength of re-finement. It was found in our analysis that the depth error sensitivity of a pixel is related to its ground-truth depth value, implying that the adaptation of _ should refer to the value of b (which is an approx-imation of ). For a trade-oﬀ between quality and overhead, we divide the set S(^∗) into  disjoint sub-sets _(_^∗) 1 ≤  ≤ , each of which is assigned a refinement search range _. A uniform quantizer that operates on the received depth b_ is used to catego-rize the unreliable pixels in S(^∗) into one of the  subsets. After that, the best settings of {}^=1 are searched exhaustively at the sender side and transmit-ted to the receiver as the side information.

Figure 6 shows a sample result of our refinement process. Observe that depth compression introduces blocking artifacts on the decoded depth image (see Fig-ure 6 (b)(e)). With depth refinement, we can remove the artifacts largely (see parts (c) and (f) of Figure 6);

note the clarity of object boundaries that simply are not visible in the decoded depth image. Interestingly, the refinement can even recover some details that are

(a) (b) (c)

(d) (e) (f)

Figure 6. A sample result of the proposed depth refinement algorithm: (a)(d) the orig-inal depth image, (b)(e) the decoded depth image, and (c)(f) the refined depth image.

removed by the enforcement of depth smoothing (com-pare parts (a)(d) and (c)(f) of Figure 6).

5 Experiments

Simulation was carried out to demonstrate the per-formance of the proposed scheme, and the results were compared with that of [7] and [8]. All the refinement schemes were implemented with the MPEG committee software VSRS 2.1. All experiments used DERS 2.0 to generate depth images and JMVC 3.0.1 to encode multi-view videos and their depth. The average PSNR of synthesized images was computed based on the first 100 frames of each test sequence. Particularly, in im-plementing the method described in [7], we employed the magnitude of synthesis errors rather than manually generated edge maps to distinguish pixels of diﬀerent categories. For a fair comparison, all the threshold val-ues used in [7] and [8] were determined by optimizing the quality of synthesized images.

Figure 7 compares the PSNR of various schemes when the depth QP is varied from 22 to 44. The curves associated with MPEG FTV were produced without depth refinement. To see the eﬀects of ref-erence quality, parts (a) and (b) show the results generated utilizing high-quality references (QP=22), whereas parts (c) and (d) are their low-quality counter-parts (QP=31). It can be seen that all three schemes outperform MPEG FTV in all test sequences, and as expected, the improvement is the greatest when depth images are coarsely quantized. Moreover, ours has the highest gain of all the schemes—an average PSNR im-provement of 1.2dB over MPEG-FTV. The results are

DepthQP

Figure 7. PSNR of synthesized images as a function of the depth and reference QP. The reference view images are coded with QP=22 (a)(b) and QP=31 (c)(d).

consistent with diﬀerent test conditions.

Figure 8 further compares the subjective quality of synthesized images. Part (a) illustrates what can hap-pen if incorrect depth information is used for view syn-thesis. Parts (b) through (d) show the results obtained by correcting depth with one of the three schemes just described (i.e., [7], [8], and ours). As can be seen,

"ghost eﬀects" appear around object boundaries if the depth is not refined; in comparison, the visual results with depth refinement are considerably improved. Our scheme even produces a result that is very close in ap-pearance to the ground-truth view image. The rea-son behind the superior performance can be explained with Figure 9, which makes visible the unreliable pix-els detected by the three schemes. As expected, our scheme tends to correct more depth pixels locating in areas with fine texture details or vertical edges—namely, those that will crucially aﬀect synthesis quality.

6 Conclusion

To alleviate the coding eﬀects of depth images, we proposed in this paper a synthesis-quality-oriented depth refinement scheme. The approach is character-ized by the unique consideration of attempting to refine only those depth pixels that are likely to cause notice-able synthesis artifacts. In the course, we developed an analytical model to establish criteria for reliability detection and to form guidelines for depth refinement.

Since both operate on the decoded information,

addi-tional side information is transmitted to make them ro-bust against compression eﬀects. Experimental results show that our scheme has the highest PSNR gain of all the state-of-the-art methods. It also produces a re-sult that is visually similar to the ground-truth image.

Better performance is expected with the incorporation of more sophisticated disparity search. Besides, the analytical model can find its application in developing depth compression algorithms.

References

[1] C. Fehn, R. Barre, and R. S. Pastoor, “Interactive 3-DTV: Concepts and Key Technologies,” Proceed-ings of the IEEE, vol. 94, pp. 524—538, March 2006.

[2] C. Fehn, “A 3D-TV Approach Using Depth-Image-Based Rendering (DIBR),” Proceedings of Visual-ization, Imaging, and Image Processing, September 2003.

[3] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-Quality Video View Interpolation Using a Layered Representa-tion,” ACM Transactions on Graphics, vol. 23, pp. 600—608, August 2004.

[4] A. Smolic, K. Muller, K. Dix, P. Merkle, P. Kauﬀ, and T. Wiegand, “Intermediate View Interpolation based on Multiview Video plus Depth for Advanced 3D Video Systems,” IEEE Int’l Conf. on Image Processing, October 2008.

[5] E. Cooke, P. Kauﬀ, and T. Sikora, “Multi-view Syn-thesis: A Novel View Creation Approach for Free Viewpoint Video,” Signal Processing: Image Com-munication, vol. 21, pp. 476—492, July 2006.

[6] P. Merkle, A. Smolic, K. Muller, and T. Wiegand,

“Multi-view Video plus Depth Representation and Coding,” IEEE Int’l Conf. on Image Processing, October 2007.

[7] M. Tanimoto, T. Fujii, M. P. Tehrani, M. Wilde-boer, and H. Furihata, “Error Cancellation in Free-viewpoint Image Generation for FTV,”

ISO/IEC JTC1/SC29/WG11, MPEG09/M16607, April 2009.

[8] J. Sung, Y. J. Jeon, J. H. Lim, and B. M. Jeon,

“Improving View Synthesis Results based on Depth Quality Measure,” ISO/IEC JTC1/SC29/WG11, MPEG09/M16417, April 2009.

[9] “Applications and Requirements on 3D Video Coding,” ISO/IEC JTC1/SC29/WG11, MPEG09/N10570, April 2009.

(a) (b) (c) (d)

Figure 8. Subjective quality comparison of synthesized images: (a) MPEG FTV (without depth re-finement), (b) Tanimoto [7], (c) Sung [8] and (d) the proposed scheme. The depth QP is set to 44.

(a) (b) (c)

Figure 9. Pixels whose depth values are judged unreliable: (a) Tanimoto [7] (category 2), (b) Sung [8]

and (c) the proposed scheme.

在文檔中支援3-D立體視訊的數位電視多媒體平台設計(II) (頁 59-62)