國 立 交 通 大 學
電機與控制工程學系
碩 士 論 文
應用於立體顯示器之二維轉立體靜態影像技術
A Novel 2D to 3D Image Conversion Technique
Applied on Stereo Display
研 究 生 :林峻永
指導教授 :林進燈 博士
周志成 博士
應用於立體顯示器之二維轉立體靜態影像技術
A Novel 2D to 3D Image Conversion Technique
Applied on Stereo Display
研 究 生:林峻永 Student : Chun-Yeon Lin
指導教授:林進燈 Advisor : Dr. Chin-Teng Lin
Dr. Chi-Cheng Jou
國立交通大學
電機與控制工程學系
碩士論文
A Thesis
Submitted to Department of Electrical and Control Engineering
College of Electrical Engineering and Computer Science
National Chiao-Tung University
in Partial Fulfillment of the Requirements
for the Degree of Master
in
Electrical and Control Engineering
July 2005
Hsinchu, Taiwan, Republic of China
應用於立體顯示器之二維轉立體靜態影像技術
研究生:林峻永 指導教授:林進燈 博士 周志成 博士 國立交通大學電機與控制工程研究所摘要
本論文提出一個將 2D 影像全自動轉換成 3D 立體影像的技術。即將一張 2D 影像轉換成 可供左、右眼分別觀測的兩張影像,並應用於 3D 立體顯示器。此轉換技術可以分為兩 部分,第一部分將 2D 影像編碼出具有深度資訊之深度圖,第二部分為將此深度圖解碼 還原成模擬左、右雙眼觀測之立體影像。編碼部分又可以分為兩個步驟,第一步驟為 基於 3D 立體影像特性的影像分割技術。利用 3D 立體影像的特性,經由去除反光效果的 SSR(Single Scale Retinex)處理。同時也利用色調、飽和度、亮度為特徵,及使
用傳統分類法 FCM(Fuzzy C-means clustering method),再經由 Size Filter 篩選區
域大小。第二步驟為深度擷取,依據影像的深度線索及物件之間深度規則,推測出此 物件所具有的單一或漸進深度。如此在 3D 顯示器能得到更真實且舒適之視覺效果。第 二部分解碼亦分為兩個步驟。第一步驟為深度圖轉換成供左、右眼觀測之兩張影像, 在此提出兩種方法,第一種方法為線性平移法,此為簡化之方法,能有效加快轉換的 速度。第二種方法為模擬人眼視覺之平移法,此方法為依據人眼的特性為基礎所開發 之技術。但由於影像經由平移運算後會遺失部分像素的資訊,故第二部分為"破洞補 償",依照破洞的特性,選擇平均或鏡射的方法填補破洞。經過編碼和解碼的過程, 可以將單一 2D 影像自動產生可用於立體顯示器之立體影像。並建立計算深度正確性的
方法,最後以人之實際觀感和科學化數據與 SANYO 公司提出的 CID(Computed Image
A Novel 2D to 3D Image Conversion Technique
Applied on Stereo Display
Student: Chun-Yeon Lin Advisor: Dr. Chin-Teng Lin Dr. Chi-Cheng Jou
Department of Electrical and Control Engineering National Chiao-Tung University
Abstract
In this thesis, a novel automatic 2D to 3D image conversion technique is proposed. It means to convert a 2D image into the left and right eye images and apply on the 3D stereo monitor. We can divide the conversion technique into two parts. The first part is to encode the 2D image into the depth map which includes the depth information, and the second part is to decode the depth map into the stereo images which simulate the left and right eyes. The part of encoder can be divided into two steps. The first step is the image segmentation technique based on the property of 3D stereo images. According to the property of 3D stereo image, we use SSR (Single Scale Retinex) to decrease the light reflection effect. We also use hue, saturation, and intensity to be our features and chose the traditional clustering method FCM (Fuzzy C-means clustering). And we use the size filter to filter the regions with size .The second step is depth extraction. According to the depth cues and the depth rules between each object, we estimate the objects belong to the fixed or gradual depths. We can get more real and comfortable effects on the 3D monitor. The second part, decoder can be divided into two steps. The first step is to convert the depth map into the left and right eye images. We proposed two methods. The first method is linear shift algorithm, and this is the simplified method which can increase the speed of conversion effectively. The second method is binocular vision shift algorithm, and this method is based on the property of the human vision. After shifting, some information of pixels will be lost. So the second part is “interpolating holes method”. According to the property of holes, we choose the method of average or mirror to interpolate. After the process of encoder and decoder, we can automatically generate the stereo images displayed in 3D monitor from a single 2D image and establish the method of calculating the accuracy of the depths. Finally, we compare the results of our proposed method with SANYO’s method “CID” (Computed Image Depth) algorithm by the human perception and scientific data.
誌謝
在電控所的這兩年求學生涯中,首先感謝我的指導教授林進燈及周志成博士不僅 在學業方面,給予悉心指導做研究的方法,更在為人處世及求學態度上給予啟蒙及規 範,而使得本論文方能順利完成。 其次要感謝曾給我相當多協助的鶴章、文昌、群立、宇文、剛維及世安學長,以 及陪伴我一同走過研究生活的宗恆、盈彰、弘昕、Linda、裕程及瑞正等同學,還有有 德、庭瑋、家昇及峻谷等學弟的協助,也要感謝陳瑩在這一段時間給予我精神上的刺 激與鼓勵。在離開交大之前,該感謝的人與事相當多,無法一一道盡,只有將感激收 藏在心裡。 最後,謹以此論文獻給我最愛的家人,感謝父母以及妹妹、弟弟的支持,讓我能 無後顧之憂地專心於課業上的研究,順利完成碩士學位。Contents
摘要 ... i
Abstract ... ii
誌謝 ...…….……….iii
Contents... iv
List of Tables ... vii
List of Figures ...viii
Chapter 1 Introduction ... 1
Chapter 2 Background Knowledge and Related Works ... 4
2.1 Depth Perception ... 4
2.1.1 The Formation of Binocular Images... 5
2.1.2 The Cause of Generating Depth Perception ... 5
2.2 The Fundamental of 2D Image on 3D Display... 9
2.3 The Proposed 2D to 3D Image System ... 12
2.4 SANYO 2D to 3D Conversion Adaptive Algorithm ... 13
2.4.1 MTD (Modified Time Difference Method) ... 14
2.4.2 CID (Computed Image Depth Method)... 15
Chapter 3 Depth Map Estimation from 2D Image ... 18
3.1 Image Segmentation ... 19
3.1.1 The Proposed 2D to 3D Image Segmentation Algorithm... 20
3.1.2 SSR (Single Scale Retinex) ... 24
3.1.3 Fuzzy C-means Clustering. ... 26
3.1.4 Connected Component Searching ... 29
3.2 Depth Extraction... 30
3.2.1 The Proposed Depth Extraction Algorithm ... 31
3.2.2 Depth Cues ... 33
3.2.3 Zero-Plane ... 34
3.2.4 Inference the Distance Relation of Each Object... 35
3.2.5 The Index of Depth... 36
Chapter 4 Binocular 3D Image Construction Based on Depth Map ... 38
4.1.1 Camera Capturing Method ... 39
4.1.2 Linear Shift Algorithm ... 41
4.1.3 Binocular Vision Shift Algorithm... 41
4.2 Interpolating Holes Algorithm... 45
Chapter 5 Experimental Results ... 48
5.1 Experimental Results of Our Proposed Methods ... 48
5.1.1 The 2D to 3D Image Conversion Algorithm by Using Linear Shift Algorithm ... 48
5.1.2 The 2D to 3D Image Conversion Algorithm by Using Binocular Vision Shift Algorithm ... 50
5.2 Experimental Results of the SANYO Method ... 51
Chapter 6 Conclusions and Future Works ... 54
List of Tables
List of Figures
Fig. 2-1 The formation of image generating from binocular disparity... 6
Fig. 2-2 The formation of generating depth perception... 8
Fig. 2-3 The simulation of display of 3D space ... 10
Fig. 2-4 The stereo image of simulating 3D space by using stereoscope... 10
Fig. 2-5 DTI 2015 XLS ... 13
Fig. 2-6 The principle of our 3D display ... 13
Fig. 2-7 The principle of MTD ... 15
Fig. 2-8 The principle of CID ... 16
Fig. 3-1 The flowchart of our 2D to 3D image conversion algorithm... 19
Fig. 3-2 The flowchart of our proposed image segmentation algorithm ... 21
Fig. 3-3 The SSR effect on the image (a) The original image. (b) The image after SSR. (c) The pseudo-color images without SSR. (d) The pseudo-color images with SSR... 25
Fig. 3-4 The SSR effect on the segmentation image. (a) The original image. (b) The image after SSR. (c) The image segmentation result without SSR. (d) The image segmentation result without SSR... 26
Fig. 3-5 The connected component effect on the segmentation image. ... 30
Fig. 3-6 Inference the object is horizontal or vertical by the relation between two objects. (a) Two vertical objects. Determine depths by bottom lines. (b) A vertical object on a horizontal object. Determine depths by the boundaries and areas of regions. ... 32
Fig. 3-7 Flowchart of our proposed depth extraction approach. ... 33
Fig. 3-8 Zero-plane and bottom line... 35
Fig. 3-9 The gradual effect on the depth map... 35
Fig. 3-10 The example of depth index... 36
Fig. 3-11 The example of correct order ... 37
Fig. 3-12 The example of wrong order... 37
Fig. 4-1 Illustration of the camera shift capturing method. ... 40
Fig. 4-2 Camera center capturing method. ... 40
Fig. 4-3 The results after linear shift algorithm... 41
Fig. 4-4 Simulating the right eye image of binocular vision. ... 42
Fig. 4-5 The projection point of relative regions... 43
Fig. 4-6 The results after binocular vision algorithm. ... 44
Fig. 4-7 The flowchart of interpolating by horizontal smooth filter... 46 Fig. 4-8 Interpolating left or right side pixels. (a) Depth map. (b) Interpolating
holes... 46 Fig. 4-9 Illustration of interpolating holes method... 47 Fig. 4-10 The results of interpolating holes method... 47 Fig. 5-1 The 2D to 3D image conversion process of an artificial image by linear
shift algorithm. ... 49 Fig. 5-2 The 2D to 3D image conversion process of a natural image by linear shift
algorithm... 49 Fig. 5-3 The 2D to 3D image conversion process of an artificial image by
binocular vision shift algorithm... 50 Fig. 5-4 The 2D to 3D image conversion process of a natural image by binocular
vision shift algorithm... 51 Fig. 5-5 The 2D to 3D image conversion process of an artificial image by CID
algorithm... 52 Fig. 5-6 The 2D to 3D image conversion process of a natural image by CID
Chapter 1 Introduction
Recently three-dimensional (3D) images have become more popular in the amusement
parks. It is difficult to get the 3D images in the real world, because a 3D camera is heavy to
carry around and it is difficult to adjust the convergence, the sharpness and the zoom of the
3D camera. So, even if 3D-related technologies have been actively developed and many 3D
displays are used in the professional field. 3D fields have not opened up yet. One reason is
3D image software had not been provided enough. We make efforts on the research of a
single 2D image converting into stereo images. We hope that people can feel more real on
the 3D display.
Some authors have proposed a real time method to convert ordinary 2D images into 3D
[1], [2] on television. They use the “Computed Image Depth Method” (CID) [1] to evaluate
the depth by a single image and/or the “Modified Time Difference” (MTD) [2] to convert
the images, in different contest conditions, depending on the motion content. Other authors
consider multiple camera sources to establish the “Structure from Motion (SFM)” defining a
system with multiple viewpoints [3], [4]. There are some methods of shape recovery from
shading (SFS) by using ICA-based reflectance model [5]. Some authors proposed depth
estimation from image structure [6]. They proposed a source of information for absolute
depth estimation based on the whole scene structure that does not rely on specific objects.
Others authors proposed a technique used to generate stereo pair images starting from a
single image source and its related depth map [7]. But some hypothesis must be done and a
manual processing is needed in depth map extraction.
“Geometry recovering” (GR) [8] tries to identify basic contest structure such as
polygons, vanishing point, and so on to identify the best 3D spider mesh to move the
effort to recover and to reconstruct the missing data.
The real goal is the reconstruction of the binocular view of a source for a monocular
sampled source. To do this, we first calculate the parallax [9]-[11] value of each object in
the image. Then, we build the left and right eye images to the final user in a 3D perspective
entertainment. And our 3D experimental equipment is a DTI 201XLS monitor which uses
barriers to let the users see two different images on the left and right eyes.
We proposed a novel automatic 2D to 3D image conversion technique. It means to
convert a 2D image into the left and right eye images and apply on the 3D stereo monitor.
The conversion technique can be divided into two parts. The first part is to encode the 2D
image into the depth map which includes depth information, and the second part is to
decode the depth map into the stereo images which simulate the left and right eyes.
The part of encoder can be divided into two steps. The first step is the image
segmentation technique based on the property of 3D stereo images. According to the
property of 3D stereo image, we use SSR (Single Scale Retinex) to decrease the light
reflection effect. We also use hue, saturation, and intensity to be our features. The traditional
clustering method FCM (Fuzzy C-means clustering) is used. We set the threshold of the size
of the image segmentation region. If the size of the region is less than the threshold, the
region will be merged into the neighbor regions until all the sizes of the image segmentation
regions are more than the threshold. The second step is depth extraction. According to the
depth cues and the depth rules between each object, we estimate the objects belong to the
fixed or gradual depths. So we can get more real effects.
The second part, decoder can be divided into two steps. The first step is to convert the
depth map into the left and right eye images. We proposed two methods. The first method is
linear shift algorithm, and this is the simplified method which can increase the speed of
conversion effectively. The second method is binocular vision shift algorithm, and this
calculated. So the second step is “interpolating holes method”. After the process of encode
and decode, we can automatically generate the stereo images used in 3D display from a
single 2D image.
We also implement the SANYO’s method "CID" (Computed Image Depth Method)
algorithm which is converting a single 2D image into 3D images. And we will introduce
that algorithm in section 2.3. We also establish the index for calculating the accuracy of the
depth extraction. We compare the results of our proposed method with CID algorithm by the
human perception and scientific data.
The rest of the thesis is organized as follows. Chapter 2 describes the background
knowledge and related works. We introduce the depth perception and the fundamental of 2D
image on 3D display. Before we design the 2D/3D software system, we must know why
people have depth perception and the fundamental of 2D image on 3D display. Thus, we can
design a 2D/3D software system based on the human vision and applied on the 3D display.
In the post of chapter 2, we describe our 2D/3D system and the 2D/3D conversion adaptive
algorithm proposed from SANYO. Chapter 3 describes the encoder of our 2D/3D
conversion adaptive algorithm which estimate depth map from a 2D image. Chapter 4
describes the decoder of our 2D/3D conversion adaptive algorithm which constructs
binocular 3D image based on depth map. And the experimental results of our methods and
CID algorithm are discussed in chapter 5. Finally, conclusions and the future work are
1.
Chapter 2 Background Knowledge and Related Works
In order to construct a 2D/3D image conversion algorithm, we must know the
background knowledge of stereo images. We introduce some information about depth
perception and the fundamental of 2D image on 3D display. We also introduce the 2D/3D
conversion adaptive algorithm proposed from SANYO. This algorithm is to convert a single
2D image to 3D effect image automatically. Other author proposed the design of the stereo
image on 2D monitor [12]. They design the stereo image based on the optical properties. In
this chapter, we will introduce the depth perception first. In order to discover the stereo
images, we must define the depth perception. And in section 2, we introduce the
fundamental of 2D images on 3D display. We introduce the principle of stereo display based
on 2D image. There are four kinds of stereo display based on 2D image methods. They are
polarization-division, time-division, wavelength-division, and spatial-division. In section 3,
we introduce our 2D/3D system in hardware and the depth cues which we used in our
system. In the last of the chapter, we introduce the 2D/3D conversion adaptive algorithm
proposed from SANYO. We also implement CID algorithm proposed from SANYO. We
will compare our results with the results from CID algorithm in the thesis.
2.1 Depth Perception
Why do people have stereo perception when they see the images? People have thought
this problem long time ago. In order to understand the generation of the stereo image, we
must define the meaning of depth perception first. In the year of 280, Euclid explained the
fact that as to the same object, if our two eyes see the different images at the same time, we
the important reason of causing depth perception. Besides, convergence and the ability of
eye accommodation also play important roles on depth perception.
2.1.1 The Formation of Binocular Images
The scientists considered that left eye and right eye see the different images is the most
important reason of causing depth perception. The image on the retinex of a single eye is a
2D image. In the basis, the image on the retinex of a single eye doesn’t generate depth
perception. It needs another image generating from the other eye, and let the retinex
receives the different 2D image. By transmission of the optic nerves to the brain and the
decision made by brain of two images, there is the depth perception of the distance relation
between near or far of the objects.
2.1.2 The Cause of Generating Depth Perception
People know the existence and shape of objects in 3D space by the reflection of light
on the retinex. The fundamental faculty of retinex is like the principle of camera capturing.
There are many optical nerve cells distributed on the retinex. So we can know the positions
of the observed objects. However, it is not enough to know the depth perception of the
observed objects only by retinex. In the viewpoints of and Physiology and Psychology,
there are some causes of the depth perception.
(a) Physiological factors: (1) Binocular disparity:
The distance between left and right eye is about 6cm. By the difference of
formation of images from two eyes, people can perceive and determine the depth of
different image. It is binocular disparity. By the formation of images of the binocular
disparity, left and right eye get the part of real line of the image shape. And the part of
dot line of the image shape is the composition stereo shape of the pyramid.
Fig. 2-1 The formation of image generating from binocular disparity
(2) Convergence:
It means the angle of the lines of left and right eye vision. The angle changes
according to the distance between the observer and objects. If the distance is nearer,
convergence is larger. On the contrary, if the distance is farer, convergence is smaller.
The angle of two eyes needs to modify according to the distances of the objects. Thus,
people can feel the depth of the objects. In near distance, the change of convergence
has significant contributions to the depth perception, especially when it cooperates
with eye accommodation. But when the distance is over 10m, because of the tiny
change of convergence, people can’t perceive the depth of objects.
(3) Accommodation:
The lens of eye is relative to the focus lens of camera. It can raise lens to let the
image project on the surface of retinex. According to the degree of raise, people can
perceive the depth of objects. Generally, when the object is over 2 m, it is difficult to
actually perceive the depth.
(1) When the observer relatively moves to the objects, the binocular disparity is
generated by the motion of near and far scene. We said that it is the disparity of a
single eye motion. The disparity of a single eye motion may provide the perception of
the depth of objects.
(2) By telling from the difference of the size of the formation of images on retina,
even if they are the same objects, people also can distinguish the near distance from
far distance.
(3) From the arrangement of objects in 3D space, we can compose perception of depth
and stereo.
(4) When people see the parallel lines of a distant place, people also can perceive the
depth by their eyes.
(5) Even if people see the uniform objects of a distant place, for example the terraced
field and the gradient surface, people also can perceive the depth by the different
density.
(6) When people see the same contrast ratio object of a far distance, people also can
perceive the depth and distance of objects by the decrease of the contrast ratio.
(7) By the shading of objects generating from the light, people also can feel the stereo
shape of objects.
(c) The formation of generating depth perception:
Basically, when people see the scene, they approximately can divide the scene into
three levels (far, middle, near). Thus, our eyes will adjust to comfortable angles to observe
the objects. When the two eyes see the different positions and angles, binocular disparity
will exist naturally. And when the eyes see the far, middle, and near scene, the change of
lines of visions can be classified as follow:
(1) Uncrossed-Parallax.
(3) Crossed-Parallax.
In order to simulate the situation of seeing the scene, we will let the screen be a
reference surface and individually present to left eye and right eye. In Fig. 2-2, we will
explain the change of focusing visual.
Fig. 2-2 The formation of generating depth perception
(1) Uncrossed-Parallax:
Two eyes don’t focus in front of the screen (two eyes see the distant scene
parallel), and the image will be shown in back of the screen.
(2) Zero-Parallax:
Two eyes focus on the screen, and the image will be shown on the screen.
(3) Crossed-Parallax:
Two eyes focus in front of the screen, and the image will be shown in front of the
screen.
The three methods will generate different stereo effect. And when we know the
physical characters of the formation of stereo images and the binocular disparity, it is
helpful for 2D to 3D image conversion algorithm to simulate binocular vision.
(d) Monoscopic Depth cues:
There are some monoscopic depth cues:
An object that occludes another is closer.
(2) Shading:
The shadow includes shape information.
(3) Size:
Usually, the larger object is closer.
(4) Linear perspective
Parallel lines converge at a single point.
(5) Surface texture gradient:
Closer objects are more detail.
(6) Atmospheric effects:
Further away objects are blurrier.
(7) Brightness
Further away objects are dimmer.
2.2 The Fundamental of 2D Image on 3D Display
In the last section, it mainly describes the reasons why people have depth perception in
3D space. And we apply these reasons of generating depth perception on the 2D image on
3D display. And the methods of 3D display are in the following:
(1) The reasons of observers in physiology and psychology aspects. Even if it is a 2D
image, it can be described as the perception of far or near and belongs to simulate the
display of 3D. Left and right eye see the same image of the display at the same time. Fig.
2-3 is an example of simulating 3D image. It gains the density of the parallel lines to
simulate that people see the far distant scene as the density increases. So it will generate the
Fig. 2-3 The simulation of display of 3D space
(2) Because of the difference of images received by left and right eye seeing the same
object, observers will feel the relation of far or near. It belongs to simulate the display of 3D,
too. The basic method is to let the left and right eye see the different images on the same
position and generate the stereo perception. In Fig.2-4, it makes the left and right eye see
the left and right images individually and use the mirrors and prisms to refract the light. In
general, the device is named Wheatstone & Brewster’s Stereoscope.
(a) Wheatstone’s stereoscope
(b) Brewster’s stereoscope
This method simulates the situation of eyes see the scene in 3D space and use two 2D
images to achieve the 3D effect. But, the depth perception proposed from the method has
limits.
(3) The third method is like the method proposed from (2). However, observers can’t
get the real depth perception from a single position, and they must change their observed
position. The images which the observers see must be changed according to the positions of
observers. Thus, people can get real feeling of 3D space and this is real 3D display. In order
to let the left and right eye individually see the relative images, there are four techniques:
(1) Polarization division:
Before the images input into the monitor, they pass through the polarized light plate,
then make the images into two different polarized images. One’s is polarized direction is
horizontal, and the other polarized direction is vertical. The users wear polarized glasses
which are polarized orthogonally. That is to say the left glass is horizontal polarized prism
which can only pass though the horizontal polarized image. The right glass is vertical
polarized prism which can only pass though the vertical polarized image.
(2) Time division:
Let the left and right eye image display at the different time. Assume if we want to
display the left eye image, the window of left eye will open and the window of right eye
will close. And at next time, we display the right eye image. And the window of right eye
will open and the window of left eye will shot down. Thus, the left eye sees the left eye
image only, and the right eye sees the right eye image only. If the transmission matches up
with the stereo image, the two lenses will open and close 60 times every second. And by the
principle of vision residual, our brain will combine the left eye image with right eye image
and compose a stereo image.
(3) Wavelength division:
observer wears red glass on the left eye and green glass on the right eye. Then the observer
will see the red image on the left eye and the green images on the right eye. It uses the
different colors on the left and right images to construct 3D effect image.
(4) Spatial division:
Let the images of left and right eye individually shown on the display. The left eye sees
the left eye image on the display, and the right eye sees the right eye image on the display.
For example: the monitor with barriers will let the left eye and right eye see the different
images at the same time.
2.3 The Proposed 2D to 3D Image System
In the last section, we introduce the depth cues in physiology and psychology. And we
also introduce the fundamental 2D image on 3D display, and there are four kinds of 3D
displays. Table 1 is the information of our 2D to 3D image system. We use binocular
disparity in physiology and contrast, size, and arrangement in psychology to be our depth
cues in our system. And the display we use in our 2D to 3D image system is DTI 2015 XLS
(Fig.2-5), which uses barriers to let the observer see the different left and right eye images.
Fig. 2-6 is the principle of our 3D display. When we move, our eyes will see the different
images in left and right eyes.
Table 2-1 The proposed 2D to 3D image system
Depth Cues Display
Physiology Psychology DTI 2015 XLS (Spatial Division)
Contrast
Size Binocular
Disparity
Fig. 2-5 DTI 2015 XLS
Fig. 2-6 The principle of our 3D display
2.4 SANYO 2D to 3D Conversion Adaptive Algorithm [1], [2]
The 2D-to-3D image conversion technique using the “Modified Time Difference
images into 3D images by selecting images that would be a stereo-pair according to the
detected motions of objects in the input sequential images.
The 2D images that have the objects with the simple horizontal motion can be
converted into 3D images by the MTD, but it is not good for converting from the still
images or the images that have the objects with the complicated motions. So the new
technique converting from these 2D images into 3D images is required.
The “Computed Image Depth Method” (CID) has been developed to solve this subject.
The CID allows to converting from all kinds of 2D images into 3D images. Especially the
CID is suitable for converting from still images. The 3D images are generated by computing
the depth of each separated area of the input 2D images with their contrast, sharpness and
chrominance with the CID.
These techniques have been implemented into a single-chip LSI for the automatic and
real-time 2D-to-3D image conversion.
2.4.1 MTD (Modified Time Difference Method)
Fig. 2-7 shows how the MTD converts from images into 3D images.
In the sequences of these 2D images, a bird is flying from the left to the right in front
of the mountains. In this scene, if the fourth field of these images is given to the left eye and
the second field is given to the right eye, the motion of the bird is perceived as the binocular
parallax, whereby the bird can be seen as if it were popping out of the mountains. This
technique is based on the principle well known as the Pulfrich Effect.
In order to adapt this technique to the automatic and real-time 2D-to-3D image
conversion, they had developed the MTD. In the case such as Fig. 7, the images include the
objects moving toward the horizontal direction. The result of analyzing the motion vectors
Fig. 2-7 The principle of MTD
left and the right eye images, and either eye that the delayed image is given to. So, the MTD
is suitable for converting from the images with the simple horizontal-moving object such
that the object spins or the object moves to the horizontal direction against the background.
But the MTD does not work well for the images that have the objects with the complicated
motions or no motion.
2.4.2 CID (Computed Image Depth Method)
The “Computed Image Depth method” (CID) is proposed for converting from still 2D
picture, we generally recognize the far-and-near positional relationship between the objects
in the picture by some information in it. These information are how the objects are covered
each other, how sharp the images are, how much contrast the images have, what the images
are, what the objects are, and so on. These information are supposed to be useful for the
2D-to-3D image conversion. So we use the sharpness and the contrast of the input images
Fig. 2-8 The principle of CID
The CID consists of the following two processes. One is the image depth computation
process that computes the image depth parameters with the contrast, the sharpness and the
chrominance of the input images. The other is the 3D image generation process that
generates the 3D images according to the image depth parameters. Fig.2-8 shows the basic
principle of the CID.
At first, each sharpness, contrast and chrominance values of the separated areas in the
input images is detected respectively. The sharpness means the high frequency element of
the luminance signal of the input images. The contrast means the middle frequency element
of the luminance signal. The chrominance means the hue and the tint of the color signal of
the input images.
Furthermore, the adjacent areas that have close color are grouped according to the
chrominance values. The image depth computation works to be based on the grouped areas.
The image depth computation process uses the contrast values and the sharpness values.
Generally in the photographs and the TV images, the near-positioned objects have higher
sharpness and higher contrast than the far-positioned objects and the background image.
Therefore, these contrast and sharpness values are inversely proportional to the distance
from the camera to the objects. If only these values are used for the image depth
and bottom of the images. This cause is that the focused object is generally positioned at the
center of the images, and the ground or the floor is positioned the bottom of the images that
has flat surface generally and few contrast and sharpness values are taken from the bottom
areas. So, it is adopted to compensate these values by the image’s composition. The
composition has the tendency that the center or the bottom side of the images is nearer than
the upper side in the general images. So, each image depth parameter is decided by the
average of each area’s sharpness and contrast value that is weighted by the image’s
composition. This compensation would be better way to get good 3D effect, but it should be
changed according to the applications.
Secondly, the 3D image generation process generates the left and the right eye images
according to the image depth parameter of each grouped area, If the parameter of an area
indicates near, the left image are made by shifting the input images to the right, and the right
images are made by shifting to the left. If the parameter of an area indicates farm both
images are made by shifting to each opposite direction. The horizontal shift value of each
separated area is proportional to the 3D effect. Furthermore, when the image depth
parameters are changed quickly or frequently, the converted images become hard to be
watching. Therefore, each shift value is adjusted to decrease the quick changes of the image
depth parameters between the adjacent areas. As a result of these processes, the 3D images
that are easy to watch can be generated.
The CID is especially suitable for converting from still images, because it does not
need any motions of the objects in the images. Of course, the CID can be also adapted for
Chapter 3 Depth Map Estimation from 2D Image
We develop a new architecture for converting a single 2D image into the 3D effect
images. This architecture consists of the following two processes. The first part is to encode
the image into the depth map, and the second part is to decode the depth map into the left
and right eye images. The flowchart is shown in Fig. 3-1. The first part includes two steps.
The first step is image segmentation, and the second step is depth extraction. After image
segmentation, we can get more complete objects. At the depth extraction stage, we can
estimate the distances between each object according to the depth cues. We also set some
rules to infer that the object is horizontal or vertical distribution. Thus, we can feel more
correct and comfortable on the 3D monitor. After the first part, we can get the 3D
coordinates values of any objects in the scene. Thus, we have the image depth parameter.
We can encode the 2D image with depth information into the depth map. In the first section
of this chapter, we introduce our image segmentation method based on 3D effect images.
And we also introduce the image process method which we used in our image segmentation
algorithm. And in the second section, we proposed our novel depth extraction method. We
Fig. 3-1 The flowchart of our 2D to 3D image conversion algorithm
3.1 Image Segmentation
Image segmentation is a process that partitions an image into the different objects
composing it. Generally, color image segmentation approaches can be divided into the
following categories: statistical approaches, edge detection, region splitting and merging
approaches, methods based on physical reflectance models, methods based on human color
perception, and the approaches using fuzzy set theory [8], [13]. And some authors used
other classified methods to do the image segmentation. They used neural fuzzy network to
classify the image [14] and an on-line ICA-mixture-model-based self-constructing fuzzy
neural networks to segment the image [15]. Other authors also gain the texture information
which derived from wavelet into the image segmentation [16].
segmentation [17]. As for color images, the situation is different due to the multifeatures
[18]. Other authors proposed a novel approach to color image segmentation [19], extend the
general concept of the histogram, and define a homogeneity histogram. Histogram analysis
is applied to both the homogeneity and color feature domains. To calculate the homogeneity
feature, both local and global information is taken into account. And they employ a
hierarchical histogram analysis method based on homogeneity and color features. We also
implemented the color image segmentation proposed from [19]. It has good effect in color
image segmentation. But it is not suitable for the 3D effect images, because it lacks the
spatial information. So we must use some methods to gain the spatial information. We
construct an image segmentation algorithm which is suitable for the 3D effect images.
3.1.1 The Proposed 2D to 3D Image Segmentation Algorithm
We proposed an image segmentation which is suitable for the 3D effect images. The
flowchart of our proposed image segmentation algorithm is show in Fig. 3-2.This image
segmentation method can be divided into three parts. The first part is preceding process.
Due to the image is obtained by CDD camera, it will be affected with the environment. The
light reflection is cause of the fluorescent lamps and sunlight outdoors. The light reflection
has influence on our segmentation results. It will result in error segmentation results. And
we can use SSR (Single Scale Retinex) [20] to reduce this problem. SSR imitates human
retina characteristic to avoid environment lighting effect. The SSR method can separate
light source and object reflectance from the image. We will introduce SSR in the next
subsection and compare the image segmentation method using SSR with the image
2.
3. Fig. 3-2 The flowchart of our proposed image segmentation algorithm
SSR. We will show some examples to explain that SSR has the effect to decrease the light
reflection. It has a good effect on our image segmentation because we use intensity to be
one of our features. After SSR processing, the new image will keep each object specular
rate and present clear and undistortion details.
And the second part is feature extraction and classified method. There are two
important ideas for color segmentation [21], which are uniform chromaticity scale (UCS)
perception can be directly expressed by an Euclidean distance in the color space [18]. (L*,
a*, b*) and (L*, u*, v*) color spaces are approximately UCS. On the contrary, (R, G, B) and
(X, Y, Z) color spaces are non-UCS. Achieving an adequate segmentation result depends on
segmentation techniques by detecting similarity among the attributes of image pixels. The
UCS is a mathematical system to match sensitivity of human eyes with computer processing.
Color has psychological attributes; the perceptual color space is usually described by hue,
saturation and intensity (H, S, I). The followings are the equations of converting colors from
RGB to HSI. 360 if B G H if B G θ θ ≤ ⎧ = ⎨ − > ⎩ (3-1) with 1 2 1/ 2 1 [( ) ( )] 2 cos [( ) ( )( )] R G R B R G R B G B θ − ⎧ − + − ⎫ ⎪ ⎪ = ⎨ ⎬ − + − − ⎪ ⎪ ⎩ ⎭ (3-2) 3 1 [min( , , )] ( ) S R G B R G B = − + + (3-3) 1 ( ) 3 I = R G+ +B (3-4) The intensity is a measure of total reflectance in the visible region of spectrum, and it
is an achromatic component of color. The hue is the attribute of color perception denoted by
red, yellow, green, blue, and so on. Saturation is used to describe how pure a color is or how
much white is added to a pure color. Using perceptual color space for color image
segmentation has two advantages: (1) specifying and controlling color is more suitable for
intuition of human than using the primary color RGB; (2) it can control intensity and
chromatic components more easily and independently. Other authors proposed circular
histogram thresholding for color image segmentation [23]. They also concluded that using
1-D clustering (Hue) may be not good as that of 3-D clustering (Hue, Saturation, Intensity).
using hue, saturation, and intensity to be the features is better than using (L, a, b) space. And
using more features (hue, saturation, intensity) is better than only one feature (hue). So we
choose hue, saturation, and intensity to be our features. And classified method, we use a
traditional clustering method (the FCM algorithm) in image features (H, S, I). With FCM
method, the pixels belonging to a valid class are clustered. A cluster is determined if the
maximum value of the membership function is below the threshold T. FCM is a supervised
classified method, so we need to decide the number of clusters. We use CE (classification
entropy) to measure for the fuzziness of the cluster partition. We use 2-6 to be the number of
the clusters in FCM and calculate CE. When the CE is minimum, it regards as a good
partition. We decide the number of clusters which has minimum value of CE. In the
experimental experience, there is good image segmentation result when the ranges of the
clusters number between 2 and 6.
And other authors proposed to use Self-organizing feature map (SOFM) to select
which features are important in an image [24]. We also try to use SOFM to select our import
features. But we want to simplify our algorithm, we choose the traditional clustering
method, FCM. And we will introduce FCM later.
After FCM method, the original image will be divided into many regions. Each region
has the unique label number. But it will generate under-segmentation problem. Namely,
different objects are associated to the same region. This circumstance represents the lack of
spatial information. Hence, we solve this problem by connected component searching
method. We will introduce connected component searching method later. And we set a
threshold of the size of the regions. Because of the 3D effect images, we need few
clustering number in order to get more complete objects on the 3D monitor. If the size of the
region is less than the threshold, the region will be merged into the neighbor region which
the size is smallest. And we merge little regions iteratively until all the sizes of the regions
3.1.2 SSR (Single Scale Retinex)
Due to the image is obtained by the CCD camera, it will be affected with the light
reflection. For example, the objects in color images can exhibit variations in color saturation
with little or no correspondence in luminance variation. In order to reduce this problem, we
adopt the technology of imitating human retina characteristic to avoid environment lighting
effect. There have been suggested over the years on many variants of the retinex. The last
version that Land proposed is now referred to as the Single Scale Retinex (SSR) [10], The
Single Scale Retinex for a point (x, y) in an image is defined as being:
( , ) log ( , ) log[ ( , ) ( , )]
i i i
R x y = I x y − F x y ⊗I x y (3-5) where ( , )R x y is the retinex output and ( , )i I x y is the image distribution in the i
th
i spectral band. In the above equation the symbol ⊗ represents the convolution operator and F x y is the Gaussian surround function: ( , )
2 2 ( ) ( , ) x y c F x y Ke + −
= , and c is the Gaussian surround constant - analogous to the σ generally used to represent standard deviation. The Gaussian surround constant c is what is referred to as the scale of the retinex. The SSR
method can separate light source and object specular rate from image. After SSR processing,
the new image will keep each object specular rate and present clear and undistortion details.
Fig. 3-3 (a) and (b) are the original image and the processed image by single scale retinex
method. Fig. 3-3 (c) and (d) show the pseudo-color images without SSR and with SSR
respectively. We can find there are many highlight places in the image without SSR
processing, and the highlight places are concentrated in an area in the image with SSR
processing.
We did the indoor experiment. We take a picture of the cup on the table. Fig. 3-4 (a) is
the original image in intensity channel. We can see there is light reflection on the table
is the image segmentation result without SSR. We can see that the table is divided into more
than one part because of the reflection. And we can see that Fig. 3-4 (d) is the image
segmentation result after SSR. The table after image segmentation is complete, because the
SSR decreases the light reflection.
Fig. 3-3 The SSR effect on the image (a) The original image. (b) The image after SSR. (c) The pseudo-color images without SSR. (d) The pseudo-color images with SSR.
(a) (b)
(a) (b)
(c) (d)
Fig. 3-4 The SSR effect on the segmentation image. (a) The original image. (b) The image after SSR. (c) The image segmentation result without SSR. (d) The image segmentation result without SSR.
3.1.3 Fuzzy C-means Clustering [25]
Clustering essentially deals with the task of splitting a set of patterns into a number of
more-or-less homogeneous classes (clusters) with respect to a suitable similarity measure
such that the patterns belonging to any one of the clusters are similar and the patterns of
different clusters are as dissimilar as possible. The similar measure used has an important
effect on the clustering results since it indicates which mathematical properties of the data
set, for example, distance, connectivity, and intensity, should be used and in what way they
should be used in order to identify the clusters. In nonfuzzy “hard” clustering, the boundary
of different clusters is crisp, such that one pattern is assigned to exactly one cluster. On the
contrary, fuzzy clustering provides partitioning results with additional information supplied
Consider a finite set of elements X={x1, x2,…, xn} as being elements of the
p-dimensional Euclidean space RP, that is, xj∈RP, j=1, 2,…, n. The problem is to perform a partition of this collection of elements into c fuzzy sets with respect to a given criterion,
where c is a given number of clusters. The criterion is usually to optimize an object function
that acts as a performance index of clustering. The end result of fuzzy clustering can be
expressed by a partition matrix U such that
U=[u ]ij i=1...c, j=1...n, (3-6)
where uij is a numerical value in [0, 1] and expresses the degree to which the element xj
belongs to the ith cluster. However, there are two additional constraints on the value of uij.
First, a total membership of the element xj∈X in all classed is equal to 1.0; that is,
1 1 c ij i u = =
∑
for all j =1, 2,…, n. (3-7)Second, every constructed cluster is nonempty and different from the entire set; that is,
1 0 n ij j u n = <
∑
< for all i =1,2,…,c. (3-8)A general form of the objective function is
1 1 1 ( , ) [ ( ), ] ( , ) c n c ij k j ij j k i j k J u v g w x u d x v = = = =
∑∑∑
, (3-9)where w(xj) is the a priori weight for each xj, g[w(xj), vij] influences the degree of fuzziness
of the partition matrix, and d(xj, vk) is the degree of dissimilarity between the data xj and the
supplemental element vk, which an be considered the central vector of the kth cluster. The
degree of dissimilarity is defined as a measure that satisfies two axioms:
(i) d x v( ,j k)≥0, (3-10)
(ii) d x v( j, k)=d v x( ,k j),
With the above background, fuzzy clustering can be precisely formulated as an
optimization problem:
Minimize J u v( ij, k), i, k= 1,2,…, c; j=1,2,…, n (3-11)
Subject to Eqs. (3-7) and (3-8).
One of the widely used clustering methods based on Eq. (3-7) is the fuzzy c-means
(FCM) algorithm developed by Bezdek [1981]. This objective function of the FCM
algorithm takes the form of
2 1 1 ( , ) ( ) || || m c n ij i ij j i i j J u v u x v = = =
∑∑
− , m>1, (3-12)where m is called the exponential weight which influences the degree of fuzziness of
the membership (partition) matrix. To solve this minimization problem, we first differentiate
the objective function in Eq. 3-12 with respect to vi (for fixed uij, i = 1,…, c, j = 1,…, n) and
to uij (for fixed vi, i=1,…,c) and apply the conditions of Eq. (3-7), obtaining
1 1 1 ( ) ( ) n m i n m j ij j ij j v u x u = = =
∑
∑
, i= 1, 2,…, c, (3-13) 2 1/( 1) 2 1/( 1) 1 (1/ || || ) (1/ || || ) m j i ij c m j k k x v u x v − − = − = −∑
, i= 1, 2,…, c; j = 1, 2,…, n. (3-14) The system described by Eqs. (3-13) and (3-14) cannot be solved analytically.However, the FCM algorithm provides an iterative approach to approximating the minimum
of the objective function starting from a given position. Besides there are some validity
indices such as partition coefficient (PC) and classification entropy (CE) use the
information of fuzzy membership grades to evaluate clustering result.
c 2 1 1 1 PC(c)=- ( ) N N ij i j u = =
∑∑
, c 1 1 1 CE(c)=- log( ) N N ij ij i j u u = =∑∑
FCM algorithm is summarized in the following:
FCM (Fuzzy c-Means Algorithm):
Step 1: Select a number of clusters (2c ≤ ≤c n)and exponential weight
(1 )
m < < ∞m . Choose an initial partition matrix (0)
U and a termination criterion
ε . Set the iteration index l to 0.
Step 2: Calculate the fuzzy cluster centers
( )
{vil |i=1, 2,..., }c
by using U( )l
and Eq. 3-9.
Step 3: Calculate the new partition matrix (l 1)
U + by using ( ) { l | 1, 2,..., } i v i= c and Eq. 3-10. Step 4: Calculate ( 1) ( ) ( 1) ( ) , || l l || max | l l | i j ij ij U + U u + u Δ = − = − . If Δ >ε, then set l=l+1 and go to step 2. If Δ ≤ε , then stop.
END FCM
3.1.4 Connected Component Searching
After image segmentation, the original image will be divided into a lot of regions. Each
region has the individual label number. But it will generate under-segmentation problem.
Namely, different objects are associated to the same region. Fig. 3-5 (a) shows the
under-segmentation result. There are many black areas, which have different spatial
positions, in labeled image. This circumstance represents the lack of spatial information.
Hence, we use connected component searching method to solve this problem. Connected
components labeling (CCL) of an image is a fundamental step in the segmentation process
and consists in identifying and labeling the separate different regions of interest of the
modeling and computer vision and so on. It scans an image and groups its pixel into
components based on pixel connectivity. After the connected component searching, the
objects in different spatial position are separated. Fig. 3-5 (b) shows the result. The
processing will result in the generation of many new labels.
(a)
(b)
Fig. 3-5 The connected component effect on the segmentation image. (a) The under-segmentation image.
(b) The result after connected component research.
3.2 Depth Extraction
After image segmentation stage, we would assign the depths to the objects.
Traditionally, the ability to perceive depth from two-dimensional images has been
accomplished by binocular method [26], [27]. Binocular depth cues, such as disparity, have
been used. More recently, researchers have developed monocular approach [28], [29].
Monocular depth cues, such as blur, have been used to perceive depth. In these approaches,
perception. Researchers have begun looking at integration of binocular and monocular cues.
Several researchers have studied the result of combining cues to perceive depth from 2D
images [30], [31]. Others have studied how these different cues interact in creating a depth
effect [32]-[34]. In these studies, binocular cues have been considered large contributors to
depth perception in 2D images.
We proposed a novel depth extraction algorithm which can extract depth from a
single 2D image. Thus, we can encode a 2D image with depth information into the depth
map. In the first sub-section, we will propose our depth extraction algorithm. And in other
sub-sections, we will explain our method which used in our depth extraction algorithm.
3.2.1 The Proposed Depth Extraction Algorithm
According to human vision characteristic, people can perceive the depth relations
between every object because of the focused position. There exists a proper balance surface.
When we use left and right eye to observe the balance plane, it is on the same position on
the left and right eye images. The balance plane is entitled “zero-plane”. We propose a
method to find the zero-plane. Subsequently, the image will employ the mask operation
method to calculate variance each pixel. We discovered that the focused area has higher
variance than other areas. According to this characteristic, we can determine zero-plane
which has maximum variance in the picture.
After getting zero-plane, we will give depth values to each object. The depth value is
calculated according to the distances between the bottom of the object and zero-plane. All
of the areas without being focused have blur phenomenon. Due to the arrangement of
objects, we have to assume that objects under zero-plane have lower depth values. On the
Fig. 3-6 Inference the object is horizontal or vertical by the relation between two objects. (a) Two vertical objects. Determine depths by bottom lines. (b) A vertical object on a horizontal object. Determine depths by the boundaries and areas of regions.
On the other side, we estimate that the object is horizontal or vertical distribution. Because
we give the horizontal distribution object a gradual depth and the vertical distribution object
a fixed depth, the observers will feel more correct 3D effect on 3D display. In order to
estimate the relation, we set some logic rules. Fig. 3-6 illustrates some rules. Fig. 3-6a gives
two examples of two vertical objects. We can assign depths to the two vertical objects by
calculating their bottom lines. Fig. 3-6b gives the examples of that one object is horizontal
distribution and the other is vertical distribution. We can assign depths to the vertical
distribution object and the horizontal distribution object by the boundaries and the areas of
regions. We set the depth relationship between two objects, but the segmentation image
would include more than two objects. We can use the rules of depth relation one by one
until the estimations of all pairs have finished. When we get the depths of objects and infer
that the objects are vertical or horizontal distributions, we can arrange the lists of depths and
get the depth map. Thus, we can get the 3D coordinates values of any objects in the scene.
Fig. 3-7 Flowchart of our proposed depth extraction approach.
3.2.2 Depth Cues
In section 2.1, we proposed some depth cues of a 2D image. They are useful in depth
extraction from a single 2D image. We use some of them in depth extraction. We use
contrast, size, and arrangement in psychology. And in physiology, we simulate binocular
disparity to generate the left and right eye images. We will introduce in the next chapter.
Generally in the photographs, the near-positioned objects have higher contrast than the
inversely proportional to the distance from the camera to the objects. So we choose
“contrast” to be one of our depth cues. But only contrast is not enough. Because it often
occurs that the center of the images become nearer than both sides, top and bottom of the
images. This cause is that the focused object is generally positioned at the center of the
images, and the ground or the floor is positioned the bottom of the images that has flat
surface generally and few contrast and sharpness values are taken from the bottom. So we
add the factor “arrangement” to depth extraction. We consider the position of the object in
the vertical direction of the image. If the object is on the top of the image, we give it a farer
depth map. On the contrary, if the object is on the bottom of the image, we give it a nearer
depth map. And the factor “size” is also used in depth extraction. We inference the object is
vertical or horizontal object by using the factors “arrangement” and “size”. We will
introduce the inference the distance relation of every object at section 3.2.4.
3.2.3 Zero-Plane
We employ the mask operation method to calculate variance each pixel. We determine
zero-plane which has the maximum variance in the picture. It simulates that two eyes focus
on the plane, and the property of the zero-plane is that the pixels on the plane don’t shift on
the left and right eye images. After image segmentation, we get the regions of objects. We
named the horizontal line which is on the bottom of the object bottom line. We assign the
depth to the object by calculating the distance between zero-plane and bottom line. We
illustrate an example in Fig. 3-8. We can see the highest variance is in the center of the
image. So we determine zero-plane on the penguin. And we calculate the depth of mouse by
Fig. 3-8 Zero-plane and bottom line.
3.2.4 Inference the Distance Relation of Each Object
We set some rules to infer the relation between every object. We estimate that the
object is horizontal or vertical by the rules shown in Fig. 3-7. We calculate the left, right, up,
and down boundaries of objects. And we can infer that the object is vertical or horizontal
distribution. We assign depths to the two vertical distribution objects by bottom lines. And
we assign the depths to the vertical and horizontal distribution objects by the boundaries and
areas of regions. If there are more than two objects, we can use the depth relations rules one
by one until the estimations of all pairs have finished. We can see Fig. 3-9 are the examples
of fixed and gradual depths. On the 3D monitor, we can see the object with a fixed depth
stands on the object with gradual depths.
(a) (b) Fig. 3-9 The gradual effect on the depth map.
3.2.5 The Index of Depth
We proposed a method to set an index to determine the accuracy of depth extraction.
The principle of our method is to detect the front and rear relation of one object and the
other objects individually. And calculate the error rate. We illustrate an example in Fig. 3-10.
We can see there are five objects in this image. And we labeled them in label 1-5. And the
order of actual depths is 5, 4, 1, 3, and 2. Assume the order of depths we computed is 5, 4, 1,
2, and 3. Fig. 3-11 illustrates how we compute the depth accuracy of object 1. We compute
if other objects are far or near relative to the object 1. And we compare the relation of actual
depths with computed depths. If it is the same, the depth estimation of the object is correct.
On the contrary, if it is different, the depth estimation of the object is wrong. In Fig. 3-11,
we can see that the objects 2, 3, 4, and 5 are rear of the object 1 both in actual depths and
computed depths. So the depth estimation of object 1 is correct. In Fig. 3-12, we estimate
the depth of object 4. We can see that the object 5 is front of the object 4 in actual depths.
But it is rear of the object 4 in computed depths. So the depth estimation of object 4 is
wrong. After we computed the depth accuracy of the five objects, we can find that object 1,
object 2, and object 3 are correct. And the object 4 and object 5 are wrong. The depth
accuracy of the image is 60 percent.
Fig. 3-11 The example of correct order
Chapter 4 Binocular 3D Image Construction Based on
Depth Map
After we have the depth image parameters, we can decompress the depth image
parameters to the left and right eye images. We proposed two kinds of shift algorithm. They
are inferred from two kinds of camera capturing methods which generate stereo images.
And the shift algorithm is to generate two images from one image, so there are some holes
in the left and right eye images. We need to interpolate these holes to let the observers feel
more comfortable. We test some interpolating holes algorithm to interpolate holes. We
found that the interpolating method by horizontal smooth filter has better effect. We will
introduce shift algorithm in section 4.1 and interpolating holes algorithm in section 4.2.
4.1 Shift Algorithm
After obtaining the depth map, the left and right eye images can be obtained by
stereovision characteristic. Namely, obtaining the left and right eye images is based on the
image depth and the hardware parameters of 3D display. If we do not consider the hardware
parameters of 3D display, the binocular vision image is relative to the image depth.
We proposed two kinds of shift algorithms. They are inferred from two kinds of
camera capturing methods which generate stereo images. One is camera shift capturing
method and the other is camera center capturing method. Camera shift capturing method is
to shift a camera and get the left and right eye images. And camera center capturing method
is to let the focus centers of two images on the same position. Using camera center
linear shift algorithm from camera shift capturing method and binocular vision shift
algorithm from camera center capturing method.
4.1.1 Camera Capturing Method
Basically, a camera is like the structure of eyes. When we observe the objects, the
scene passes though the eye ball (lenses) and format an image on the retina (a negative of a
photo). When we see the stereo objects in 3D space, the left and right eye must transmit
images to the brain individually. In other words, the brain will compose a stereo image by
binocular parallax and retinal disparity. According to the principle, we shift the camera to
simulate our eyes. Fig. 4-1(a) illustrates how to shift the camera to capture stereo images.
And Fig. 4-1(b) is the camera capturing shift method.
When the eyes see the scene, the vision angles of two eyes will focus on the same
position of the object. And this will generate less binocular parallax. Based on the principle,
we can let the camera to simulate our eyes to capture the scene. We will let the focus center
of left and right images on the same position of the objects. By simulating the binocular
vision which focuses on the same position, we can let the parallax of left and right remained
images decrease at the least degree. And we can feel more comfortable. Fig. 4-2 is camera
(a)
(b)
Fig. 4-1 Illustration of the camera shift capturing method. (a) Shifting the camera to capture the stereo image.
(b) Camera shift capturing method.
4.1.2 Linear Shift Algorithm
When people see a scene, the nearer objects will shift more and the farer objects will
shift less on the two eyes. Based on the thesis, we can simplify the shift by linear method.
On the right eye image, if the depth of pixel is higher than the zero-plane, the pixel will
rightwards shift. On the contrary, if the depth of pixel is less than the zero- plane, the pixel
will leftwards shift. And the number of shift pixels is direct proportion to the distance
between the depth value of pixel and zero-plane. On the contrary, we can get the left eye
image in the opposite direction. Fig. 4-3 is the example of the result after linear shift
algorithm.
Fig. 4-3 The results after linear shift algorithm.
4.1.3 Binocular Vision Shift Algorithm
We propose a method that simulates the human vision. The scene projects to the left
is shown in Fig. 4-4. We assume that two eyes focus on the point whose vertical coordinate
is half of the maximum depth of the scene and horizontal coordinate is half of the width. We
can get the projection depth (D’) on the following formulas:
(4-1), (4-2), (4-3),
(4-4), (4-5), (4-6)
Fig. 4-4 Simulating the right eye image of binocular vision. w: image width, dm: scene depth, D: the depth between right eye and scene, b: the distance between two eyes, D’: the projection depth.
(a) (b) (c) Fig. 4-5 The projection point of relative regions.
(a) Region 1, (2) Region 2, (3) Region 3.
After getting the projection depth (D’), we can infer the relative point (x, d)
individually in region 1 (Fig. 4-5a), region 2(Fig. 4-5b), and region 3(Fig. 4-5c).
And followings are the results:
Region 1: (4-7) (4-8) (4-9) Region 2: (4-10)