Chapter 1 Introduction
1.4 Organization of the thesis
This thesis attempts to cover whole adequate knowledge that helps you understand the core value of our proposed system. Therefore, you are recommended to begin with chapter 2 if you have no ideas about the formation of 3D images or fundamental optics. Based on the concepts elaborated in chapter 2, prior arts of 3D image capturing are introduced in chapter 1. It should be noted that image processing used in prior arts are not carefully expounded lest those contents might confuse readers to catch on purpose of our design. Moreover, the preface directs you to a specific world of 3D technology, which facilitates people to grasp the motivation and objective in chapter 1. Following chapter 1, the optical system and algorithm are revealed according to the objective in chapter 3. Subsequently, chapter 4 comprises the experimental result of captured elemental images and computed process as well as the final depth map. Furthermore the discussion is included in chapter 4. As for chapter 5, conclusion is yielded and it is discussed in future work that how liquid crystal lens avails the 3D image capturing system.
27
Chapter 2 Theory
The conception of 3D capturing, as the name implied, consists of 3D image formation and optical capturing. In this chapter, how people perceive depth will be introduced psychologically and physiologically. In addition, optical background of imaging is included in the part of microscopy which is an extreme case of imaging and help readers to catch on every possible related optical term. In the end of this chapter, we’ll expound the idea of High Dynamic Range (HDR) image because it would link the idea of our depth rendering system.
2.1 Principle of 3D images
From the viewpoint of evolution, three-dimensional perception is an essential specialization for predators. Predators, such as lions or leopards, have to draft their hunting plans in which distance estimation based on visual sense is one major factor. The more precisely predators estimate the distance, the more likely they can catch the prey successfully. [20] Human being is one kind of predators, and how people perceive the third-dimensional information is of interest. At present, because images reaching our eyes only display two-dimensional spatial relationship, researchers have summarized many depth cues that are thought to be used in the brain for human visual system (HSV) [21][22], as shown in Figure 2-1.
Due to the fact that human eyes are horizontally separated, two images fallen onto each eye are hence with slightly difference. This difference, binocular disparity, is referred to compute the depth in the brain and fuse the two images into a 3D scene. This mechanism is named stereopsis [23][24] which is one principal way for men to reconstruct the depth, so our brains can combine stereo image pair together into the cyclopean image, if displays can deliver them to two eyes respectively, as shown in Figure 2-2 [25].
28
Figure 2-1 Depth cues of human visual system
Figure 2-2 Binocular vision: 2D stereo image pair fused into 3D cyclopean image However, quantitative measure of binocular disparity is quite important for designing 3D displays and 3D image capturing. Nowadays, 3D displays including parallax type [26] or lenticular lens [27] type, all need to reorganize the arrangement of pixels to produce stereo image pairs, and how to reorganize pixels mainly stands on the geometric relation of disparity
29
and depth. Likewise, concerning the reverse process, if there is a system able to shot at least two images with disparity, depth information can be computed correspondingly. This reverse process is termed 3D image capturing. Figure 2-3 illustrates the geometry when two cameras fixate at distance u’, i.e. the elemental images have no disparity.
Figure 2-3 Geometry of binocular vision
Via simple geometric triangulation principle, if convergence angle is small (e.g., D u), we can approximate an equation of disparity d at depth u where v’ is the perfect imaging distance, D and F are deviation of two apertures and focal length of lens respectively [28]. As a consequence, sensitivity of depth is restricted by the size of sensing elements, i.e. pixel size, should be smaller than disparity.
(13)
According to above equation, we can determine the working range and setup of the capturing system; on the other hand, depth u can be rendered by the disparity of two or a sequence of elemental images when applying the same equation after rearranging it [29].
(14)
Besides, depth cues can be further categorized physiologically and psychologically.
Physiological cues are related to eyes’ motions while psychological cues are regarded as
30
interpretation according to previous experience [30]. As a result, the difference between oculomotor and monocular depth cues can be comprehended from physiology and psychology.
Oculomotor depth cues comprise two mechanisms: accommodation and convergence as shown in Figure 2-4 [31]. Ciliary muscle and ciliary body control the optical power of lens and bring about auto-focusing in near region (~3m) which is called accommodation [32];
convergence is a process that oculomotor nerve participates in control of eye movement which helps human focus the object(s). As for monocular depth cues, Figure 2-1 only enumerates some of them but the answer to whether we could feel 3D perception by single eye can be deduced from those cues. Monocular depth cues only indicate the pretended spatial relationship of objects in the scenes by what we have built in our vision memory. In other words, even if we view a 2D picture, we know which objects are in front (3D information) but these objects actually all lie on a 2D plane. Although the details are not discussed, it is not difficult to grasp the ideas through individually visual experience [33][34].
Figure 2-4 Oculomotor depth cues: accommodation and convergence
To make a long story short, the product of 3D images is however composed of more or less independent cues; to put it differently, it is not ascribed to one particular system but a step of reasoning.
31
2.2 Microscopy
There are plenty of things that cannot be seen with the unaided eyes, but these tiny objects do change our living somehow. Epidemical viruses could lead to coughing or a running nose, for example. Therefore, scientists and engineers have invented several kinds of instruments to investigate this mystic world. Optical, electron, and scanning probe microscopes are three prominent branches in this technical field, called microscopy [35]. Among the three branches, optical microscope is the basis of our system proposed in this thesis. Before we catch on the mechanism of it, we had better review some optical terms.
2.2.1 Optical Terminology
In general optics, numerical aperture (NA) is defined by the index of refraction n’ (of the medium in which the image lies) times the sine of the half angle of the cone of illumination U’
[36], as shown in Figure 2-5.
(15)
Figure 2-5 Numerical aperture of an optical system
NA is a dimensionless number used in microscopy to disclose how much light will be accepted by the system according to a particular object and it varies as the point moves.
Besides, in the light of not only Rayleigh criterion invented by Lord Rayleigh [36] but also
32
Sparrow criterion invented by Carroll Mason Sparrow [37], the minimum resolvable separation R is inverse proportional to NA where λ is the wavelength of light source as the under equations of two criteria respectively.
(16)
(17)
Rayleigh criterion signifies the limitation of perfect imaging influenced by Fraunhofer diffraction, and defined as the principle maximum of the diffraction pattern of one falling on the first minimum of that of the other. However, Sparrow criterion also points out the same concept but in another condition that both central maximum and the minimum in between just coincide, as shown in Figure 2-6. Although Sparrow criterion provides a stricter but accurate condition, in Rayleigh’s own words: “This rule is convenient on account of its simplicity and it is sufficiently accurate in view of the necessary uncertainty as to what exactly is meant by resolution.” [38].
Figure 2-6 Rayleigh and Sparrow criteria for two overlapping diffraction patterns
33
When it comes to NA, f-number (f/#), or focal ratio, resorts to the same characteristic of an image system. The illumination (power per unit area) at image side is inverse proportional to image size, and the image area is proportional to the square of focal length according to the Newtonian form of thin lens equation. Thus, the ratio of image size and aperture size is a quantity of the relative illumination and the square root of this ratio is called relative aperture, f-number, given by [36]:
(18)
To confirm the equivalence of NA and f/#, we reconsider the Rayleigh criterion. The intensity distribution E of the diffraction of a point source is governed by Bessel function of the first kind, where E0 is the central maximum [39].
(19)
Hence the minimum resolvable separation R is words, this equation only holds when the subtended angle is small enough.
Depth of focus (DOF) is also an inevitable express with respect to microscopy because DOF is referring to a longitudinal amount within which the imaging is considered clear. In contrast, depth of field indicates the range in object space that all object points can be imaged with acceptable sharpness. Although there are two definition of DOF, the slight difference is whether the depth of field is symmetric about some reference plane along the optical axis.
Figure 2-7 shows the depth of field and depth of focus as the colored regions respectively.
And by the geometric relation and Gaussian form of thin lens equation, assuming the system is perfect, i.e. no aberration and no diffraction, DOF, δ, is given by:
34
(22)
where c is the circle of confusion limit, or the pixel size of the sensor. The corresponding depth of field is from snear to sfar [36]:
(23)
(24)
where D is the diameter of the entrance pupil of the lens, f is the focal length of the lens, and s is the nominal distance at which the system is focused.
Figure 2-7 Depth of focus and depth of filed
Taking account of Dfar, depth of field will be infinite while
(25) where shyp is called the hyperfocal distance of the system which is a leading distance for fixed-focus cameras [40].
35
2.2.2 Optical Microscope
The simplest microscope is a magnifying glass which adds refractive power to the eye and provides larger scene that the image seen by unaided eye. Without a doubt, it is desired for the magnifying glass to produce an erect image. Both convex lens and concave lens can form an erect image, but only convex lens create a magnified image. As a result, the object should be placed within the focal length f (e.g., ). Concerning the functionality of microscopes, magnifying power, MP, or angular magnification is utilized and described as the ratio of the size of the image seen through the optical element/system over the size of the image seen by unaided eye at normal viewing distance do, generally taken as 254 mm or 10 inches. Instead, MP is conveniently defined as the ratio of the angles made by the chief rays from the top of the object in the circumstance of aided (αa) and unaided (αu) eye respectively, as depicted in Figure 2-8 and Figure 2-9..
(26)
Figure 2-8 An unaided view of an arrow object
36
Figure 2-9 An aided view through a magnifying glass
Due to paraxial approximation, and , hence
(27)
Besides, grounding on transverse magnification relation and Gaussian Lens Formula, MP becomes
(28)
For most common situation that the object is positioned at the focal point, the virtual image is at infinity correspondingly and for all practical values of l, MP results in
(29)
It is a pleasing feature that parallel rays procure the relaxed and unaccommodated vision.
Nevertheless, MP for simplest magnifiers is limited 2X or 3X owing to aberrations, so other more complicate magnifiers are designed up to 20X of MP.
In order to provide higher MP, compound microscope allegedly invented by H Janssen
37
and his son Z. Janssen [41] combines two optical units: eyepiece and objective. As implied by the names, eyepiece is a visual optical instrument to adjust a comfortable viewing range and expand the image further while objective is closest to the object and frequently serves as the aperture stop and entrance pupil of the system, as illustrated in Figure 2-10.
Figure 2-10 A rudimentary compound microscope
Thus, the total MP of the entire system is the product of the transverse linear magnification of the object, MTo, and the angular magnification of the eyepiece, MAe.
(30)
Generally speaking, tube length, denoted by L, is standardized as a constant, 160 mm. If the focal lengths of eyepiece and objective are the interested variables, the total MP can be formulated as
38
(31)
where do is the standard near point.
Last but not least, angular field of view in image side is a vital factor to specify the extent of the largest object that can be viewed. The main component dominating the field of view is the field stop, strictly speaking, is the exit window. The image of the field stop formed by the optical elements following it is called exit window. The cone angle subtended at the center of the exit pupil by the periphery of the exit window is said to be the angular field of view [38].
As a matter of fact, there are still many components needed to structure a compound microscope as shown in Figure 2-11.[41] For example, condenser is a lens designed to concentrate the light onto the specimen. Nonetheless, the key character has been demonstrated and elaborated above.
Figure 2-11 Basic optical transmission microscope and its elements
39
2.3 High Dynamic Range Imaging
Dynamic range is the ratio between the largest and smallest values of a changeable quantity.
Humans have high dynamic range (HDR) in sight and hearing. People can see the objects under weak moonlight or under bright sunlight. This dynamic range is about 90 dB. However, they cannot achieve the perception in both of the extreme cases at the same time and it takes time to adjust between different visual or hearing situations. In practice, HDR data require more space to record in audio or video. Hence, some tricks are used to accomplish HDR with narrow recorded dynamic range data. For instance, program makers don’t use cue of brightness to display nighttime or daytime scenes. Instead, they utilize duller colors and blue lighting to imitate the way that human eyes perceive at low light levels. [42]
In photography, the conventional format of digital images is bmp or jpg (jpeg) which generally use 24 bits for each pixel, and each pixel contains 3 primary colors, so the range of gray levels is from 0 to 255. In other words, the contrast ratio of the images is 256:1 and it is sufficient for most of the scenes. However, if the scene is exposed under the sunlight, it would extend the contrast to 50,000:1.
HDR images posses all the information under different exposure and have wide range of luminance information, so it uses more than 12 bits per channel to cover the large luminance range between the highlights and the shadows. Typically, HDR images can be created by composing the images under different exposure time. Imaging technology makes it possible to capture and storage of HDR images; nevertheless, the output limitation of common displays has not followed the advances. Therefore, several algorithms are designed to adjust the range of luminance of the real world so that HDR images can be displayed on the devices with lower dynamic range. [43]
40
2.3.1 High Dynamic Range Imaging Rendering Algorithm
HDR image rendering algorithms can be categorized into two types: global operators and local operators. Global operators apply same processing over one image based on the image content while local operators use different mapping methods according to spatially localized content. Notwithstanding global operators benefit faster computation and easier to implement, local operators allow for larger dynamic range compression. The following are brief introduction to these operators.
Sigmoidal Transformation: Sigmoid contrast enhancement function S(t) derived from a discrete cumulative normal function is utilized to rescale the lightness for gamut mapping.
This method was presented by Braun in1999. Afterwards, this method is modified to compress the HDR images by the logarithm of luminance.
(32)
Histogram Adjustment: By incorporating the human visual models of glare, spatial acuity and color sensitivity effects, the histogram of luminance is modified to reproduce the imperfections in human vision, which was proposed by Ward in 1997.
Figure 2-12 Dodging and burning effect
Photographic Reproduction: Different luminance mapping of highlight and shadow region is applied to simulate the dodging-and-burning effect in traditional photography. As Figure 2-12 illustrated, Dodging decreases the exposure to make the film negative brighter
41
while burning increases the exposure to make it darker. [44] This tone mapping techniques was presented by Reinhard et al. in 2002.
Bilateral Filtering Technique: An image is decomposed by an edge-preserving spatial filter into base layer and detail layer. Base layer contains large scale of variations. The overall brightness and base contrast is compress and subsequently two layers are combined into final image. This technique proposed by Durand and Dorsey in 2002 reduce the overall contrast but maintain the local details in the image.
Local Eye Adaption: The Naka-Rushton equation is modified to predict the response of cones and rods. According to the S-shaped response function, the luminance channel is compressed. This local-eye adaption method is presented by Ledda et al. in 2004.
The aforementioned algorithms [44] [45] are just few of them, but the core idea is trying adjust the range of luminance in the image with our conventional displays due to the limited contrast ratio. In brief, HDR images provide more realistic visualization of the real word as what people perceive. If one day, researchers surmount the limited capability of displays and our knowledge to human visual system, more robust models and operators can be utilized to improve the perceptual accuracy. Finally, Figure 2-13 shows an example of HDR image rendering by two images with different captured intensity. [47]
Figure 2-13 Example of high dynamic range image by tone mapping (a)+2 stop (b)-2 stop (c) HDR image
42
Chapter 3
Structure and Algorithms
Binocular vision is the foundation of 3D capturing with lens array. We utilize the disparity of elemental images to render the depth information. However, the corresponding problem is always an issue for stereo camera. Lens array in our High Dynamic Depth Range (HDDR) system is thus modified with different focal arrangement and the configuration of temporal and spatial systems will be illustrated in the first part of this chapter. Besides, depth information is extracted by the Depth Estimation Reference Software (DERS) and the post image processing, Depth map Fusion from Edge Exploring Thresholding (DFEET), is carried out in order to fuse the depth maps together. In the other part of this chapter, we will carefully elaborate on each step in DERS and DFEET. Finally, the limitation of our algorithm will be discussed as well.
3.1 3D Image Capturing with Lens Array
In conventional microscopes, depth of field decrease as magnification increase, so it needs to adjust the focal plane to clearly observe the specimen. Furthermore, some techniques of 3D image capturing are restricted in near field due to the shallow depth of field as well. Blurred images would bring about the matching error as computing the disparity. Even for the light field camera, the reversibility of light rays is the first hypothesis. As a result, when a source
In conventional microscopes, depth of field decrease as magnification increase, so it needs to adjust the focal plane to clearly observe the specimen. Furthermore, some techniques of 3D image capturing are restricted in near field due to the shallow depth of field as well. Blurred images would bring about the matching error as computing the disparity. Even for the light field camera, the reversibility of light rays is the first hypothesis. As a result, when a source