Chapter 1 Introduction
1.4 Organization
This thesis is organized as follows. Chapter 1 gives an overview of the camera reconstruction system and a concept of proposed system. Preliminary techniques used by the proposed system as two system blocks, the camera tracking and the model reconstruction, is introduced in Chapter 2. Next in Chapter 3, the main contribution of this thesis, the back-ground removal using 3D feature points adopted as the object extraction process in the proposed system, is described in detail. The experimental results and conclusions are given in Chapter 4 and Chapter 5, respectively.
Chapter 2
Model Reconstruction Using Octree Algorithm
Octree with marching cube is the most popular 3D model reconstruction algorithm for computer vision or medical image processing. A detailed description of the algorithm is given in Chapter 2.
2.1 Introduction
As mentioned in Section 1.3, the proposed system is based on the shape-from-silhouette algorithm. Among all systems using the same algorithm, the most used model reconstruction method is the octree algorithm [33]. An octree is a hierarchical tree structure consisting of cubes of various sizes in proportion. The use of an octree to represent a 3D model results in several advantages due to the geometry characteristics of cubes.
First, the cube structure is easy to divide in a sequential way, and is also convenient for programming and calculation. Second, any 3D model can be approximately represented by a set of cubes of various sizes. Last but not least, the octree structure is easy to transform into the triangular-mesh model using the “Marching Cubes” algorithm [23, 24] after the octree is built, for the fact that the triangular-mesh model is much smoother and more approximate to the original object because every mesh in 3D space can be separated into triangles.
2.2 Octree
A 2D octree is introduced for easy explanation of the octree structure. Instead of cubes, a 2D octree can be used to represent a 2D diagram composed of rectangles of various sizes, as given in Figure 2.2.1, where a large rectangle could be further divided into four small rectangles. For example, the top rectangle ‘A’ in Figure 2.2.1 is divided into four rectangles
‘A’, ‘B’, ‘E’, and ‘F’. Correspondingly, a 2D octree structure is depicted by a tree of depth 3 in Figure 2.2.2, where each rectangle is called a node. For a node of depth r and index i, it is denoted as s with r numbered top-down and i indexed sequentially. For example, the top r
9 and is labeled as IN, ON, or OUT to denote whether the rectangle is in, on, or out the object contour. If a rectangle is ON, i.e., on the contour, then further partition it into four rectangles, otherwise, leave it unchanged. Viewing from Figure 2.2.1 with a curve as the object contour, the gray rectangle s is ON because it contains the object contour, and then further divide it 1A into four green rectangles s , 2A s , C2 s , and 2I s of depth 2, where N2 s , 2A s , C2 s are also ON. 2I Figure 2.2.2 2D Octree structure of Figure 2.2.1
B D C
The formal approach to determine whether the rectangle s is IN, ON, or OUT is to find ir one of the intersections of rectangle edges and the object contour. The rectangle is ON if any intersection is detected. However, the formal approach is time-consuming in detecting the intersection by checking all points on the edges. A more efficient detecting way is instead to check only the vertices v , 1,2,...,4imr m= . The rectangle s can be labeled as IN or OUT ir for all vertices lay inside or outside the object contour, or as ON otherwise. Since the efficient way only checks the vertices, it fails for the condition as shown in Figure 2.2.3, where the contour is contained inside the rectangle but none of the vertices are inside the contour. The failure will result in defect of octree to decrease the similarity between octree and object contour, and the defect becomes serious for smaller depth r. A solution to the condition is to further check 2R−r −1 checking points, which are equally distributed on the rectangle edges, and apply the above efficient detecting way to all the vertices and checking points. However, the fail condition still can not be solved on rectangles with depth R, unless the formal approach is adopted. But with large R, the effect is relatively small.
Figure 2.2.3 Fail condition for octree
The accuracy of the octree model depends on the maximum depth R, so R must be chosen before creating an octree. From the previous example, it is obvious that during the octree construction process, the rectangles of ON will be divided until they are at depth R, while the IN and OUT rectangles are kept as large as possible.
To demonstrate the effect related to the maximum depth R, another example is given in Figure 2.2.4 for R from 4 to 7, where the white region represents the original object shape.
Besides, the blue cubes indicate the IN cubes and the green cubes indicate the ON cubes. It is obvious that with the increase of R, the shape of octree diagram is approximate to the original object shape further.
11
nodes, denoted as s of depth r and index i. Hence, a 3D octree can be represented as ir
{ }
s{
s r 2,...,Randi 1,2,...,8}
T3D = 11 ∪ ir = =
Note that each cube s has 8 vertices denoted as ir
(
r imr)
im r im r
im x ,y ,z
v = , m=1,2,...,8.
(a) (b)
(c) (d)
Figure 2.2.4 2D octree diagram of different maximum depth R. Green and blue rectangles representing the ON and IN rectangles respectively : (a) R = 4, (b) R = 5, (c) R = 6, (d) R = 7
2.3 Construction of 3D Octree
The first step of 3D object reconstruction is to construct its 3D octree diagram Γ. With the estimated object pose denoted as
{ }
Δ,Θ in the previous section, where Δ=(
δx,δy,δz)
and(
θx θy θz)
Θ = , , represent the translation and the rotation of the object with respect to the camera in 3D space, an octree node s with vertices ir vimr =
(
ximr ,yimr ,zimr)
, m=1,2,...,8, isthen transformed to the vertices vimr '=
(
ximr ',yimr ',zimr ')
as( ) ( )
After applying (2.3.1) on all vertices of the nodes of octree diagram Γ, the resulting octree diagram Γ’ will have the same pose of the object in the image.
After the pose transformation, the octree diagram Γ’ is projected onto the image plane by applying perspective projection as (2.3.2) on each vertex of each node of octree diagram Γ’.
⎪⎪
13
inside or outside the object mask, or as ON otherwise.
When processing a series of images, two addition rules are added to the octree adjustment.
First, once a cube is labeled as OUT, the cube would never be adjusted again, since it is clear that it is impossible for a cube lies outside the object in one viewing angle but lies inside the object in another angle. Second, an ON cube may be re-labeled as OUT, but can not be re-labeled as IN, for the reason that 3D octree is reconstructed in 2D image plane, the OUT cube may be projected inside or on the object mask and the ON cube may be project inside the object mask. The first situation conflicts with the first rule and the cube must be labeled as out. On the other hand, the ON cubes in second situation become IN cubes due to the model motion and project, they should be kept as ON cubes.
After applying adjustment to the octree with all input object mask and object pose, an approximated model of the object is obtained, as shown in Figure 4.
2.4 Model Triangulation
After the 3D octree construction, the ON cubes of the octree diagram O represent an approximation 3D model of the object and the larger depth R is, the more accurate the octree diagram becomes. However, the depth R is limited to about 7 due to the constrain of computational resources, for example the memory and computational time increase proportionally to the triple order of the depth R. With limitation on the depth R, triangulation of the octree not only increases the accuracy of the model but also makes the texture mapping much easier. Figure 2.4.1 shows the reconstruction result of a sphere using a 3D octree of max depth 3. Though max depth 3 is an extreme case which would not be chosen in the reality reconstruction system, it is a good example showing the difference between octree model and Triangulation model
(a) (b) (c)
Figure 2.4.1 Reconstructing of sphere using 3D Octree of max depth 3 : (a) original model, (b) octree model, (c) triangulation model of (b)
A typical triangulation algorithm called “Marching Cubes” , which is also adopted in this thesis, is often applied to the cubic-type 3D model reconstruction, such as the octree. The
“Marching Cubes” algorithm is based on the fact that the real object surface would intersect with the edges of the ON cubes. By finding the intersection points cube by cube, triangles can be constructed by the geometry relationships of these intersection points. Figure 2.4.2 illustrated 15 patterns of transforming cube into triangles in the Marching Cubes algorithm.
Figure 2.4.2 15 patterns of transforming cube into triangles in Marching Cubes. The dotted
corners represent corners lay inside the surface[23]
It is difficult to represent the 256 geometry relationships for a cube in mathematical equation since each vertex of the cube may lie inside or outside the surface. Hence, a general way to implement the Marching Cubes algorithm is to encode the geometry relationships of cube vertices into a unique index number ranging from 0 to 255. The indexing scheme is shown in Figure 2.4.3. The table below the cube shows the correspondence of the eight bits to the vertices v , i i=1...8, of the cube. Each bit of index number is assigned with 1 if the corresponding vertex lay inside the surface or 0 otherwise. As a result, a unique index number is generated and then used to find out the corresponding triangle structure, which is defined using the edge number ej, j=1...12, in the look-up table.
15
Figure 2.4.3 indexing scheme of Marching Cubes.[23]
From Section 2.2.4, every ON cube should have some vertices laying inside the object and others laying outside, making some edges of the ON cube intersect with the object surface.
Thus these edges should have an inside vertex in one end and an outside vertex in the other end, which means an intersection point exists on each of these edges. The exact intersection points on these edges can be determined by geometry algorithms such as binary search [37].
After the intersection points are obtained, the marching cubes algorithm is then adopted to construct the triangle mesh model. Figure 2.4.4 gives an example of triangulation result of marching cubes algorithm.
(a) (b)
Figure 2.4.4 Triangulation example[37]: (a) octree model, (b) triangulation model of (a)
Chapter 3
Object Extraction Using 3D Feature Points
Segmentation of foreground and background of an image is crucial many computer vision applications. Two existing and developing algorithms are introduced in Section 3.1. Based on structure of the proposed reconstruction system, a new algorithm is proposed to remove the background and extract the targeting object in the foreground, utilizing the tracking results of camera tracking system mentioned in Section 2.1, including camera poses and the 3D positions of feature points. Two steps of the Algorithms, the 3D foreground / background segmentation and the object mask generation, are explained in the Section 3.2 and 3.3.
3.1 Existing Image Segmentation Algorithms
3.1.1 Graphical Partitioning Active Contours
“Active Contour”[19, 36] is the most used algorithm for object extraction. An active contour, or a snake, can be represented using (3.1.1).
( )
s(
x( ) ( )
s ,y s)
, s[ ]
0,1v = ∈ (3.1.1) where s represents the indexing value of each points belong to the active contour, as v(s)
represents the position of the point indexed by s. The deformation of an active contour is controlled by an energy function defined by the image information like color and edge, as shown in (3.1.2)[32].
( )
( ) ( ) ( ) ( ) ( )
(
E v s E v s E v s)
dsEsnake =
∫
internal + image + constraint (3.1.2)where Einternal represents the energy of smoothness defined by elasticity and stiffness parameter, Eimage represents the energy defined by image information including color, texture and edge, and Econstraint represents the energy defined by information other than the image like object shape pattern, respectively.
17
iterative process of minimization is required. Many minimization process with an implementation of active contour have been proposed. The most studied one among these algorithms is the “Level set method”[29]. Instead of deforming the contour, the level set method assigns a height value c(x,y) for each point (x,y) in image and uses the height value to determine the location of contour according to (3.1.3). The height value c(x,y) is iteratively updated by calculating the effect of Esnake to point (x, until saturation, which y) indicates no sign changes for any c(x,y). As the iteration process is completed, the contour of the targeting object is obtained.
⎪⎩
The GPAC[31] algorithm is proposed recently for the sake of foreground/background segmentation of nature pictures, as shown in Figure 3.1.1(a) and 3.1.1(b). However, the segmentation result of GPAC algorithm shown in Figure 3.1.2(b) indicates a unexpected result in segmenting the test image as Figure 3.1.2(a). Viewing from Figure 3.1.1, the GPAC algorithm is well-performed under the condition that the foreground and background are different in color tones. When the color of foreground object is similar to the background, the segmentation is failed.
(a) (b)
Figure 3.1.1 GPAC segmentation results : (a) test image 1, (b) extracted foreground of (a)
(a) (b)
Figure 3.1.2 GPAC segmentation results : (a) test image 2, (b) extracted foreground of (a)
3.1.2 Image Segmentation
Different from GPAC, image segmentation algorithms tend to divide the image into several segments according the segmentation condition, usually determined by the color and edge information. The objective of image segmentation algorithm is to divide the image into several segments such that each segment contains pixels with similar features and is mostly distinct to other segments. The basic image segmentation algorithm is the watershed algorithm, which simply calculates the boundary of every pixel with local minimum color.
The watershed algorithm always over-cut the image and is applied as a pre-process of other segmentation algorithms, as shown in Figure 3.1.3(b). Improvements are applied to watershed algorithm including merging similar segment areas and using more information like texture and edge for calculation. EDISON[8, 16] is a segmentation algorithm implements the improved algorithm, and the segmentation result is shown in Figure 3.1.3(c).
From the segmentation result of EDISON in Figure 3.1.3(c), it is obvious that the over-cut problem still exists to be solved for the algorithms based on watershed. Hence, the graph-based image segmentation algorithm[12, 38] is introduced. Instead of image processing theory, the algorithm segments image based on graph theory. The image to be segmented as a graph G=
( )
V,E , where vi∈V represents pixels in image and eij∈E represents connecting edge of two neighboring pixels vi and vj. A weight value wij =w( )
vi,vj is calculated for every edge eij according to the dissimilarity of pixel vi and vj. With the weight value w , the graph-based segmentation algorithm is able to segments the image into19
Φ ' E '
Ek ∩ l = for k ≠l, while satisfying the objective of image segmentation algorithm mentioned above. A graph-based segmentation proposed in [12] has been tested and the result is illustrated in Figure 3.1.3(d).
Though the image segmentation algorithm is performs well in divide image into image blocks, it can not determine if the image block belongs to foreground or background. It is impossible to piece image blocks together to form a complete object image since the algorithm provides nothing about the geometry of the object. Therefore, the image segmentation algorithm is not suitable for the reconstruction system, either.
(a) (b)
(c) (d)
Figure 3.1.3 Image segmentation results : (a) test image, (b) result of watershed algorithm, (c) result of EDISON[16], (d) result of graph-based segmentation algorithm proposed in [12]
3.2 3D Foreground / Background Separation
3.2.1 Problem Analysis on Separation
To obtain an image sequence containing an object suitable for reconstruction, the image sequence must be taken by a camera orbiting around the target object, as shown in Figure 3.2.1. To satisfy the above condition, the shooting environments should be arranged to keep the target object from the surrounding background in a certain distance. To reconstruction the 3D information from the image sequence, the camera tracking system is adopted to reconstruct the camera 3D pose and 3D feature point positions. With reconstructed 3D feature point positions, the separation between the target object and the background will be also restored by the camera tracking system. As a result, the reconstructed feature points are distributed in two groups separated by an empty space gap. The group close to the camera is identified as the foreground, while the other one far away from the camera is identified as the background. The average distance between points of the foreground is much smaller than that of the background, as illustrated in Figure 3.2.2. The objective of separation is to cluster the reconstructed feature points into two groups of foreground and background.
Figure 3.2.1 Camera motion when capturing image sequence. The camera is always aiming at the object. Illustration generated using Maya PLE 8.5[2]
21
Figure 3.2.2 Top view of the distribution of reconstructed 3D feature point of a frame.
Circle and arrow at right bottom represent the position and viewing direction of the camera, respectively. The smaller ellipse indicates the foreground points while the larger ellipse
indicates the background points.
Though there is a clear separation between the two point groups, the clustering algorithm like K-means is not applicable for two reasons. First, most clustering algorithm attempts to separate 3D data using planes. However, the boundary shape of foreground and background group is more likely a sphere or an irregular shape, not a plane, as shown in Figure 3.2.3.
Second, it is unnecessary to find a separation surface to extremely separate the 3D feature points into two groups of foreground and background. Instead, the background removal can be performed by an equivalent process, the foreground extraction, by picking out the foreground points from the feature points.
Figure 3.2.3 Top view of the distribution of reconstructed 3D feature point of image sequence. The orange arrow indicates the orbit of camera.
3.2.2 Proposed Foreground Extraction Algorithm
With reconstructed camera position and feature point location information, an algorithm performing foreground points extraction is proposed for this particular condition. Utilizing the distribution characteristic illustrated in Figure 3.2.2, the proposed algorithm first determines the initial two-point set of the foreground group by finding two closest points either or both of them are visible to the camera. The determination was performed by checking the included angle of two vectors, the camera viewing direction and the vector from camera position to the feature point position, by (3.2.1) .
( )
⎟⎟23
position, the camera viewing direction, and the direction from camera to feature point, respectively. The point Pf is determined visible to camera if θv is smaller than θc , usually defined as the viewing angle of the camera, or can be defined as a small angle such as 10 degrees to narrow down the range of foreground object.
A classification process is then performed to iteratively integrate suitable unclassified feature points into the foreground group. An unclassified feature point is selected and integrated if its distance to any foreground group point is smaller than the threshold distance Thd, defined as (3.2.2).
(
1 2) (
1 2)
d d mean d ,d
d W , d d, W,
Th = ⋅ (3.2.2)
where W =min
(
ImageWidth,ImageHeight)
, d1 and d2 represent the distances from the camera to the two points of the initial point set, d is the expected minimal number of feature points extracted from 2D projection of the object surface.From (3.2.2), Thd is proportional to the image resolution since the pixel distance between two feature points becomes larger for higher image resolution. Besides, Thd is inverse proportional to the distance from camera to the foreground object, due to the fact that the object size in image becomes smaller and the distance between feature points becomes closer when the object is farther. Thd is also inverse proportional to the expected minimal number of feature points for the reason that the more feature points on an object surface, the closer the feature points.
The detail of proposed foreground extraction algorithm is presented in Table 3.2.1.
Table 3.2.1 Foreground Extraction Algorithm Input
Minimal dimension of image width and height W, expected minimal number of feature point d, 3D camera position Pc, camera viewing direction Dc, camera viewing angle θc, and a set of 3D feature points position P.
Output
foreground points set F Algorithm
1. Initialization : Determine the initial two-points set of foreground group
1. Initialization : Determine the initial two-points set of foreground group