Chapter 3. VIEW SYNTHESIS
3.4 View Synthesis
.
After the conversion from the image coordinate into the world coordinate, the following is to shift the world coordinate along the width-axis at another view point. Because of the horizontal shift, the value of y-axis would be the same. We can derive the synthesized value of x-axis of the points on the horizontal plane by operating the equation (2) backwardly.
However, the synthesized value of x-axis of the points on the vertical plane and the moving objects could be converted into the image coordinate more easily.
Similar to Fig. 15, because we have known the varied width and depth of p1: (Wp1, Hp1, Dp1)=(w’, Hp1, d), we can assume two points p5: (0, Hp1, d) and p6: (0, Hp1, 0). Similar to equation (11) and (12), cos(p3p4p2) equals cos(p5p6p1) and p2p3p4 and
6 5 1p p
p are both right angles. So, p2p3p4 and p1p5p6 are similar triangles.
Therefore,
- 28 -
3 4 3
4 5 6
1 5 2 3
' p p d p w p p p
p p p
p . (18)
By equation (14), we could derive p4p3. Finally, by equation (13),
2 2
3
frame
p W p
x . (19)
- 29 -
Chapter 4.
EXPERIMENT RESULT
In this chapter, we would exhibit the results in [4], [5], and [6] for comparison at the first section. At the second section, there are three video sequences, camera 5 of lovebird1, camera 7 of Alt Moabit, and homemade video hallway, to be used to synthesize the stereoscopic views. We would exhibit the original view, the synthesized view, and the depth map for subjective results. To represent the stereoscopic effect, we would compose a red-cyan anaglyph of the original view and the synthesized view. And then, the synthesized videos from lovebird1 and Alt Moabit would compare to the ground truth objectively in PSNR.
4.1 Related Results
4.1.1 Stereoscopic Video Synthesis from a Monocular Video [4]
Zhang et al. proposed three steps to synthesize stereoscopic video from monocular one.
They tracked the camera motion by a camera-tracking algorithm at first. And then, a
optimization algorithm is utilized for determining the proper base frame to warp for synthesis.
- 30 -
(A)
(B)
Figure 17- An example of stereoscopic video generation.
The input monocular video is taken in the air. (A) shows the recovered base trajectory and a few frames from the base sequence. (B) shows the composed stereo frame [4].
4.1.2 Depth Map Generation for 2D-to-3D Conversion by Short-Term Motion Assisted Color Segmentation [5]
Chang et al. propose an algorithm to combine the motion segmentation and color segmentation for conversion the depth map. They assign the detected foreground as the
- 31 -
nearest and assign the other as the farthest.
(A) (B)
(C) (D)
(E) (F)
Figure 18- Subjective view results
(A) Original Akko & Kayo sequence (B) The depth map of Akko & Kayo (C) Original Children sequence (D) The depth map of Children (E) Original Weather sequence (F) The
depth map of Weather [5].
- 32 -
4.1.3 3D Stereoscopic Image Pairs by Depth-Map Generation [6]
Battiato et al. estimate the depth map by the depth gradient constructed by vanishing lines.
And then, utilize the work in [13] to assign the sky as the farthest.
(A) (B)
(C) (D)
(E)
- 33 -
Figure 19- (A) Original outdoor with geometric appearances image (B) Geometric depth map (C) Qualitative depth map through [12] (D) Final depth map (E) Anaglyph image [6].
4.2 Results of Proposed Method
4.2.1 Subjective Results
In lovebird1 video, the moving objects move closer and closer. Fig. 20 is the previous frame of Fig. 21. The gray level of the moving object is 175 in Fig. 20-C and the gray level of the one is 185 in Fig 21-C. It shows the depth would vary with the position of the moving object. Because the distance between two eyes is so close, it is difficult to distinguish the difference between the original view and the synthesized view. We adopt the red-cyan anaglyph to exhibit the stereoscopic effect and make the variation more differentiable in human vision.
(A) (B)
- 34 -
(C)
(D)
Figure 20- the camera 5 of lovebird1 (A) The original view, (B) The synthesized view, (C) The depth map of (A), (D) Red-cyan anaglyph.
(A) (B)
- 35 -
(C)
(D)
Figure 21- another view of the camera 5 of lovebird1 (A) The original view, (B) The synthesized view, (C) The depth map of (A), (D) Red-cyan anaglyph.
In Fig. 22-C, the gray level of the walker is 154 and the gray level of the bus is 204.
Although the absolute depth value may be not accurate, it could distinguish the related depth between the objects.
- 36 -
(A) (B)
(C)
(D)
Figure 22- Camera 7 of Alt Moabit (A) The original view, (B) The synthesized view, (C) The depth map of (A), (D) Red-cyan anaglyph.
- 37 -
(A) (B)
(C)
(D)
Figure 23- homemade video (A) The original view, (B) The synthesized view, (C) The depth
- 38 -
map of (A), (D) Red-cyan anaglyph.
4.2.2 Subjective Result of Predefined Initial Depth Value
We would set the initial depth value of the moving objects manually. Under the manual initial depth value, the depth map would combine the depth of background and the depth of the moving objects. And then, the effect of depth update would be judged more easily. In Fig 24-A, the initial depth value is assigned. And then, the depth value of the person after depth update is also similar to its foothold in Fig 24-B, C. It represents that the depth update is reliable when the initial depth value is accurate; otherwise, the depth update is relative accurate.
(A)
- 39 -
(B)
(C)
Figure 24- the depth map of manual initial depth value. (A) The frame #1 of Alt Moabit, (B) The frame #40 of Alt Moabit, (C) The frame #60 of Alt Moabit.
4.2.3 Objective Results
In this section, we would compare the synthesized view to the ground truth from the multiview video sequences. In Fig. 25, the synthesized video is camera 6 of lovebird1.
For the red line, it is synthesized from camera 5 of lovebird1 by the proposed method.
- 40 -
According to the intrinsic parameter of camera, we set the focal length to 2017.8074. And then, set the distance between camera 5 and camera 6 to 38.66 by translation parameter.
Although the inaccuracy of moving object could influence the result, the quality of the background view synthesis is kept to the average value 27.93 in PSNR.
For the blue line, it is the synthesized from camera 5 and camera 8 of lovebird1 by the tool of multiview synthesis VSRS [3]. However, it needs to be preprocessed to estimate the depth map [1] from the left and right view as shown in Fig. 2. So, there are six views to be used to estimate and synthesize to the virtual one. The average of PSNR from the proposed method from the monocular video is only lower than the average of PSNR from multiview synthesis by about 3 to 4 dB.
Figure 25- Synthesize the camera 6 of lovebird1 from single view and multiview synthesis.
The red line is synthesized by VSRS, and the blue line is synthesized by the proposed method.
The average of red line in PSNR is 31.88, and the average of blue line in PSNR is 27.93.
In Fig. 26, the synthesized video is camera 8 of Alt Moabit. For the red line, it is synthesized from camera 7 of Alt Moabit. The focal length and the distance between camera 7 and camera 8 are set by camera parameter to 1382.4 and 62.05 respectively. In this video, the
- 41 -
quality of the synthesized video is unsteady. The chief influence of PSNR is the misses of moving object detection. Since there are some moving objects with large size in video, the misses of the object would decrease the value of PSNR substantially. In frame number 36, the big bus is moving in the frame. The transparent windows of the bus would mislead the moving object detection. These windows are classified as the static background, so that the value of PSNR decreases substantially from frame number 36. However, except for that, the quality in PSNR is kept at about 29.44.
For the blue line, it is synthesized from camera 7 and camera 10 of Alt Moabit by the tool of multiview synthesis VSRS [3]. There are six views to be used to estimate and synthesize to the virtual one as the above-mentioned. The average of PSNR from the proposed method from monocular is only lower than the average of PSNR from multiview synthesis by about 4dB.
Figure 26- Synthesize camera 8 of Alt Moabit from single view and mutiview synthesis.
The red line is synthesized by VSRS, and the blue line is synthesized by the proposed method.
The average of red line in PSNR is 33.7, and the average of blue line in PSNR is 29.44
- 42 -
Chapter 5.
CONCLUSION
5.1 Conclusion
In this study, we propose the transform algorithms to synthesize stereoscopic view from monocular video. The fundamental idea is based on the stereoscopic model constructed by vanishing point. Therefore, the preprocessing steps are necessary before view synthesis.
To improve the effect of moving object detection, we make some modification from [8].
The improvement of modification is shown in Fig. 6. The result of background registration has been promoted. Another preprocessing step is vanishing point dectection. We adopt the algorithm in [6] to search the vanishing lines and vanishing point.
In view synthesis, there are two partitions: background projection and moving object projection. The effect of the proposed transforming algorithms could be shown in subjective method, which exhibits the stereoscopic effect by constructing the red-cyan anaglyph, and be shown in objective method, which compares the synthesized view with the ground truth in PSNR. It is different with other works of view synthesis from single view. Although the value of PSNR is lower than the synthesized view from multiview, the synthesized view of the proposed method could keep the difference of PSNR within about 3 to 4 dB even if there is much less information in monocular video than multiview ones.
The proposed method provides novel transforming algorithms and the results could be compared in objective and subjective method instead of only compared by subjective results.
- 43 -
REFERENCES
[1] Masayuki Tanimoto, Toshiaki Fujii, Kazuyoshi Suzuki, “Multi-view depth map of Rena and Akko & Kayo”, ISO/IEC JTC1/SC29/WG11, M14888, 2008.
[2] Masayuki Tanimoto, Toshiaki Fujii, Kazuyoshi Suzuki, “Experiment of view synthesis using multi-view depth”, ISO/IEC JTC1/SC29/WG11, M14889, 2008.
[3] Cheon Lee, Yo-Sung Ho, “View Synthesis Tools for 3D Video”, ISO/IEC JTC1/SC29/WG11, M15851, 2008.
[4] Guofeng Zhang, Wei Hua, Xueying Qin, Tien-Tsin Wong, and Hujun Bao, “Stereoscopic Video Synthesis from a Monocular Video”, IEEE Transactions on Visualization and Computer Graphics, Vol. 13. No. 4, 2007.
[5] Yu-Lin Chang, Chih-Ying Fang, Li-Fu Ding, Shao-Yi Chen, and Liang-Gee Chen,
“Depth Map Generation for 2D-to-3D Conversion by Short-Term Motion Assisted Color Segmentation”, IEEE International Conference on Multimedia and Expo, pp. 1958-1961, 2007
[6] S. Battiato, A. Capra, S. Curti, M. La Cascia, “3D Stereoscopic Image Pairs by
Depth-Map Generation”, Proceedings of the 2nd International Symposium on 3D Data Processing, Visualization, and Transmission, pp. 124-131, 2004
[7] Shao-Yi Chien, Yu-Wen Huang, Bing-Yu Hsieh, Shyh-Yih Ma, and Liang-Gee Chen,
“Fast Video Segmentation Algorithm With Shadow Cancellation, Global Motion
Compensation, and Adaptive Threshold Technique”, IEEE Transactions On Multimedia, Vol. 6, No. 5, October 2004.
[8] Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, “Efficient Moving Object
Segmentation Algorithm Using Background Registration Technique”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No. 7, July 2002.
- 44 -
[9] Sohaib Khan, Mubarak Shah, “Object Based Segmentation of Video Using Color, Motion and Spatial Information”, IEEE Computer Society Conference on CVPR, pp. II -746-751, 2001.
[10] Andreas Krutz, Matthias Kunter, Mrinal Mandal, Michael Frater, “Motion-based Object Segmentation using Sprites and Anisotropic Diffusion”, Image Analysis for Multimedia Interactive Services, WIAMIS ’07 Eighth International Workshop on, pp. 35, 2007.
[11] Youichi Horry, Ken-ichi Anjyo, Kiyoshi Arai, “Tour Into the Picture: Using a Spidery Mesh Interface to Make Animation from a Single Image”, Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pp. 225-232, 1997.
[12] Karl Kral, “Sid-to-Side head movements to obtain motion depth cues: A short review of research on the praying mantis”, Behavioural Processes, Vol. 43, Issue 1, pp. 71-77, April 1998,
[13] D. Comaniciu, P. Meer, “Robust Analysis of Feature Spaces: Color Image Segmentation”, In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, pp. 750-755, June 1997.