A quality controllable multi-view object reconstruction method for 3D imaging systems

(1)

A quality controllable multi-view object reconstruction method for 3D

imaging systems

Wen-Chao Chen

a

_{, Hong-Long Chou}

b

_{, Zen Chen}

_a,*

a

Department of Computer Science, National Chiao Tung University, 1001 University Road, Hsinchu 300, Taiwan b

Altek Corporation, Science-Based Industrial Park, Hsinchu, Taiwan

a r t i c l e

i n f o

Article history: Received 2 July 2009 Accepted 22 February 2010 Available online 21 April 2010 Keywords:

3D imaging system Modeling from silhouettes Octree model

XOR projection error System performance Dynamic modeling Progressive transmission Multi-camera system

a b s t r a c t

This paper addresses a novel multi-view visual hull mesh reconstruction for 3D imaging with a system quality control capability. There are numerous 3D imaging methods including multi-view stereo algo-rithms and various visual hull/octree reconstruction methods known as modeling from silhouettes. The octree based reconstruction methods are conceptually simple to implement, while encountering a conflict between model accuracy and memory size. Since the tree depth is discrete, the system perfor-mance measures (in terms of accuracy, memory size, and computation time) are generally varying rapidly with the pre-specified tree depth. This jumping system performance is not suitable for practical applica-tions; a desirable 3D reconstruction method must have a finer control over the system performance. The proposed method aims at the visual quality control along with better management of memory size and computation time. Furthermore, dynamic object modeling is made possible by the new method. Also, progressive transmission of the reconstructed model from coarse to fine is provided. The reconstruction accuracy of the 3D model acquired is measured by the exclusive OR (XOR) projection error between the pairs of binary images: the reconstructed silhouettes and the true silhouettes in the multiple views. Inter-esting properties of the new method and experimental comparisons with other existing methods are reported. The performance comparisons are made under either a comparable silhouette inconsistency or a similar triangle number of the mesh model. It is shown that under either condition the new method requires less memory size and less computation time.

Ó 2010 Published by Elsevier Inc.

1. Introduction

The three-dimensional imaging technology is an emerging re-search topic for capturing, processing and displaying the true 3D information of a scene object. The next generation of 3D imaging systems allows spectators to view from any desired viewpoint, just like in the real world. Two common multi-view solutions to the next generation of 3D imaging systems are: (1) novel view synthe-sis and (2) multi-view 3D modeling[1–5]. The main challenges of the emerging 3D imaging systems include 3D model reconstruc-tion, video transmission rate, and 3D free viewpoint rendering, etc. The stereoscopic display based on view synthesis is a practical approach for 3D imaging nowadays. The depth-image-based ren-dering (DIBR) methods for free viewpoint TV (FTV) were intro-duced in [3–5]. Virtual images were rendered from a small number of views with 3D warping. Many researchers also utilized video interpolation technique to synthesize novel views without building the 3D shape [6–8]. Although these kinds of methods

can provide high quality images by interpolation, the viewing an-gle is limited by the initial camera positions.

In contrast to view interpolation, multi-view 3D modeling ap-proaches construct the 3D geometry of the scene and, therefore, are more suitable for applications in the holographic 3D display or 3DTV systems, which require to render views from all direc-tions, not just the in-between views. The 3D mesh with texture mapping is the most common approach for producing the photore-alistic objects. In recent years, there has been an increasing amount of literature on image-based 3D modeling. Multi-view stereo algo-rithms were proposed to reconstruct the high quality 3D model from images captured at multiple viewpoints[9–14]. Seitz et al.

[15] used high quality multi-view datasets as the benchmark to evaluate the performances of different reconstruction algorithms based on accuracy and completeness of the reconstructed objects. Multi-view stereo algorithms are generally based on photometric consistency measurement which is a time consuming procedure.

On the other hand, visual hull is an alternative approach to 3D modeling using multi-camera systems[16–19]. Franco and Boyer

[21]addressed the exact polyhedral visual hull reconstruction by cutting and joining the visual rays passing through silhouettes. Despite the complexity in joining the visual ray segments, the

1047-3203/$ - see front matter Ó 2010 Published by Elsevier Inc. doi:10.1016/j.jvcir.2010.03.004

* Corresponding author. Fax: +886 3 573 1875.

E-mail addresses: chaody.cs94g@nctu.edu.tw (W.-C. Chen), hlchou@altek. com.tw(H.-L. Chou),zchen@cs.nctu.edu.tw(Z. Chen).

Contents lists available atScienceDirect

J. Vis. Commun. Image R.

(2)

resultant visual hull is highly accurate in terms of silhouette con-sistency. However, the constructed polyhedral visual hull tends to produce ill-formed triangles (irregular and narrow planar sur-face strips)[35]. Liang and Wong presented a simple and efficient way to compute the exact visual hull vertices by the exact intersec-tion computaintersec-tion which replaces the interpolaintersec-tion value used in the conventional marching cubes (MC) approach[35]. This exact visual hull from marching cubes was reported to significantly im-prove the quality of the reconstructed visual hull, comparable to that of the polyhedral approaches and requiring less computational time. Nevertheless, the method demands a pre-specified subdivi-sion level number for the octree reconstruction. If the level number is too small, the octants generated intend to have a larger volume, resulting in lower accuracy, while if the level number is too large, requiring a tremendous computation time and memory space. On the other hand, the parallel execution of the voxel-based visual hull reconstruction is feasible. In [25] Ladikos et al. proposed a method making use of CUDA to perform the reconstruction by using kernels which perform the projection of every voxel and gather the occupancy information. They also showed a real-time 3D reconstruction system which used the GPU-based visual hull computation algorithm. There is a website where the information on CUDA is available[26]. Starck et al.[22]matched surface fea-tures between views to modify the visual hull and applied a global optimization approach to do a dense reconstruction. Vlasic et al.

[23]and de Aguiar et al.[24]acquired a template body mesh which could be generated by a laser scanner and then tracked the motion sequences by deforming the non-rigid body mesh shape to match faithfully with a video stream of silhouettes.

This paper proposes a novel octree reconstruction method. The conventional approach often faces with a conflict between model accuracy and memory size. Since the tree depth or the subdivision level is discrete, these system performance measures (accuracy, memory size, and computation time) are varying rapidly. To rem-edy this drawback, we modify the conventional method to attain much finer control over the system performance. In the new meth-od the visual quality of the octree reconstructions is controlled by a specified exclusive OR (XOR) projection error upper bound; the resultant XOR projection error reflects the inconsistency between the binary projected silhouette of the reconstructed object and the true object silhouette in each image. The introduction of new types of octants in the new reconstruction method indicates a mix-ture of protrusions and indents on the reconstructed object surface which is no longer a bounding volume of the true object. Both use-ful properties and computer simulations of the system perfor-mance are presented. Furthermore, dynamic object modeling is made possible based on the new method. Also, progressive trans-mission of the reconstructed model with an increasing degree of model accuracy is provided. Since the proposed method is fast and has the dynamic and progressive nature, together with a con-trollable system performance measure, the method is suitable to be incorporated into 3D imaging systems. Comparisons between the proposed reconstruction method and other existing methods are made to illustrate the merits of the new method.

The main features of our work include

(1) The proposed method does not enforce a maximum tree depth for the octant generation process; instead an XOR pro-jection error upper bound parameter is imposed. This parameter value selection depends roughly on the intended level of detail. Therefore, the user has some sort of control over the visual reconstruction quality.

(2) Under a comparable XOR projection error (i.e., silhouette inconsistency) constraint the proposed octree reconstruc-tion method is faster than the convenreconstruc-tional octree recon-struction method by a factor from 10 to 40. The total

processing time of our method including the conversion of the octant-base octree to a surface mesh representation is faster than the conventional octree method with the stan-dard marching cubes method for generating the ﬁnal surface mesh model (the ‘‘Conv + MC” method) and the method of generating an exact visual hull from marching cubes based on the standard octree model (the ‘‘Conv + ExMC” method or simply the ExMC method) by a factor from 2 to 3. Further-more, the reconstructed visual hull contains a triangle num-ber for the surface representation which is less than that of the existing methods.

(3) The proposed method can generally achieve better accuracy, while spending less computation time through the introduc-tion of new octant types to the convenintroduc-tional octree ones. (4) Due to the application of exact marching cubes the

recon-structed visual hull mesh is relatively smooth, so it does not need further surface smoothing for texture mapping.

The rest of the paper is organized as follows. Section2brieﬂy describes the framework of a proposed multi-view capturing and processing integrated system for generating the 3D texture mapped objects. Section3presents useful properties of the new reconstruction method with regard to its system performance. In Section4the experimental setup is described. Synthetic and real objects of different geometry complexity are used in the visual hull reconstruction. The system performance measures of the proposed reconstruction method are given. Comparisons of our reconstruc-tion method with other existing methods are provided. Conclu-sions and further research directions are given in Section5.

2. The proposed system

To fulﬁll the requirements of FTV systems or 3DTV applications, a multi-view capturing and processing system generates the photorealistic appearance of 3D dynamic objects from multi-view images based on some image-based reconstruction algorithm.

Fig. 1shows the overall system organization. Yemez and Schmitt also presented a similar efficient approach to progressively trans-mit the octree structure and then perform triangulation on the fly[38]. In our system, the preprocessing module first comprises camera calibration, background estimation, and color normaliza-tion, etc. After capturing a synchronized video sequence, the sys-tem applies background subtraction for 3D model reconstruction, and finally the multi-view texture mapping is derived for photore-alistic display of the 3D object. Only octree structure information and multi-view images are transmit from a server to clients. Here, the focus is placed on the extension of the conventional octree reconstruction and the application of exact marching cubes to con-vert the octant-based volumetric representation to surface mesh representation of the object under reconstruction. Also, dynamic modeling and progressive transmission based on the new method are to be presented. Texture mapping and Z buffer will be imple-mented with the commercial graphic cards.

2.1. A multi-view capturing system

We implement a multi-view capturing system consisting of eight IEEE 1394b synchronized color cameras which are connected to eight PCs. The PCs are synchronized with an IEEE 1394a daisy chain. The system conﬁguration is shown inFig. 2. Only a single computer command is needed to trigger the capturing process through the communication interface mechanism. The cameras capture a video sequence at a rate of 30 fps with a frame resolution of 1024 768. We also set up a blue screen-like studio of dimen-sion 3 m 6 m 2 m and all the cameras are mounted on the

(3)

ceiling to acquire the images of the objects from different viewing positions.Fig. 3shows our studio conﬁguration.

A color checker chart is used to estimate the true color informa-tion for color normalizainforma-tion and a custom-made calibrainforma-tion board is used for acquiring the projection matrix for each camera based on the Bouguet’s method[28]. Although objects are captured in a blue screen environment, foreground extraction is still not easy due to shadows casted by the objects. This paper combines the methods proposed in[30–31]with the manually selected thresh-old to extract foreground objects for the 3D reconstruction.

2.2. A novel octree reconstruction method with a projection error control

In the conventional octree reconstruction a coarse-to-ﬁne oc-tant subdivision process is recursively applied to a bounding root cell of an object under reconstruction. During the subdivision pro-cess the black nodes indicate the octants lying inside the object, while the white nodes indicate the octants lying outside the object and the gray nodes indicating the octant lying partially outside and partially inside the object. The octant subdivision process is

Fig. 1. System block diagram.

Fig. 2. A multi-view capturing system.

(4)

repeated until no gray octants remain. A typical octree representa-tion of an object model reconstrucrepresenta-tion is shown inFig. 4.In the con-ventional octree reconstruction method, the type of an octant is classiﬁed by checking the intersection relation between the pro-jected octant image and the real object silhouette in each view. In order to speed up the intersection test, an octant is approxi-mated by its bounding sphere and the intersection test is executed using precompiled signed distance maps derived from the object silhouettes[27]. The signed Chebyshev distance map for each sil-houette view can be generated with a graphic card to reduce com-putation time using the method proposed by Hoff et al.[32]. The distance value at the circle center c of the bounding circle of a pro-jected octant image is denoted by DistMap(c). The radius r of the bounding circle of the projected octant image is calculated and then compared with DistMap(c) to determine the intersection rela-tionship. A positive DistMap(c) distance value indicates the circle center is inside the object silhouette, and a negative distance value for a circle center being outside the silhouette, and a zero value for a circle center being on the silhouette boundary. The spatial rela-tionship checking is carried out on the view-by-view basis.Table 1gives the intersection relationship between the projected octant image and its associated silhouette.

To measure the model accuracy of the reconstructed visual hull O relative to the true object T, we use their binary silhouettes SO

v

and ST_vacross all views (

v

= 1, . . . , M) to deﬁne the following aver-age exclusive OR error:

XOR error ¼1 M

XM

v¼1

SO_v ST_v ð1Þ

where SO_v ST_v stands for the total number of the XOR errors be-tween pixels of the two binary silhouettes in each view

_v

.

Another way to measure the reconstruction model accuracy is to use the percentage of the error pixels of the reconstructed object given by the following ratio:

ErrðO; TÞ ¼X M v¼1 SO_v ST_v , XM v¼1 AreaðST_vÞ ð2Þ

where area (ST_v) is the area (or the total pixel number) of the region of the true silhouette. For a good object reconstruction this ratio should be much smaller than one.

Note that the error caused by the approximation with a bound-ing circle becomes negligible as the subdivision level increases to a sufﬁciently large ﬁnal level. Also, the computation of the above XOR projection error uses the actual hexagonal projected images of the reconstructed object octants rather than using the bounding circles of the octants.

In the new octree reconstruction method a pre-speciﬁed XOR octant projection error bound is denoted by P. There are ﬁve types of octants at each resolution level l in the new method: Bl,N(black),

Wl,N (white), GBl,N (gray–black), GWl,N (gray–white), and GGl,N

(gray–gray). The black and white octants are deﬁned in the con-ventional way and the three new types of gray octants are deﬁned below:

A conventional gray octant Gl,Cat level l is redeﬁned as a gray–

white (GWl,N) octant in the new method if its bounding circle

cen-ter c has a negative distance value, DistMap(c) < 0 and the black ex-tent of the octant satisﬁes the constraint: 0 < r + DistMap(c) 6 P in any particular view

v

2 ½1; M.

A conventional gray octant Gl,Cat level l is redeﬁned as a gray–

black (GBl,N) octant in the new generalized method if its bounding

circle center c has a non-negative distance value, DistMap(c) P 0 and the white extent of the octant satisﬁes the constraint: 0 < r DistMap(c) 6 P in all views

v

2 ½1; M. If a gray octant Gl,C

cannot be redeﬁned as one of the above two, it is redeﬁned a gray–gray (GGl,N) octant. That is, white and black extents of a GGl,N

octant both exceed the octant projection error bound P.

A relatively large portion of a gray–white octant remains out-side the silhouette whereas no signiﬁcant portion of a gray–gray

Fig. 4. A typical octree representation of an object model.

Table 1

Intersection relationship between the projected octant image and its associated silhouette.

DistMap(c) P 0 r > DistMap(c) in any view The projected octant image intersects the silhouette (indicating a gray octant) r 6 DistMap(c) in all views The projected octant image is within the silhouette (indicating a black octant) DistMap(c) < 0 r > DistMap(c) in any view The projected octant image intersects the silhouette (indicating a gray octant) r 6 DistMap(c) in any view The projected octant image is outside the silhouette (indicating a white octant)

(5)

octant is either outside or inside the silhouette. With the new def-inition of the gray octant types the octant subdivision scheme of the new generalized reconstruction method is given as follows.

In the new generalized octree method at each level l = 0, 1, . . . , LN.Ponly the GGl,Noctants need to be subdivided into eight

child octants. The octant subdivision process is performed from le-vel to lele-vel until there are no new GGl,Noctants at the next (ﬁner)

level. The ﬁnal level, LN,P, of the new method with a speciﬁed XOR

octant projection error bound P is deﬁned as the level at which the set of gray–gray octants generated by the new method is empty.

In a previous paper of ours[20]we extended the conventional black octant to include the gray–black (GB) octant so that the oc-tree generated is still a bounding volume of the real object. It has been shown that if a proper XOR projection error bound is selected, the extended method spends less memory space and processing time, while achieving a comparable visual quality of the conven-tional method with a properly ﬁxed subdivision levels.

In this paper we generalize the conventional octree method to include both the gray–black (GB) octants and gray–white (GW) oc-tants. The presence of GB octants indicates the protrusion of the reconstructed object model and the presence of GW octants indi-cates indent or shrinkage of the reconstructed object model. In other words, the generalized model produces protrusions and in-dents on the object surface and the new generalized model is no longer a bounding volume of the true object. As shall be seen, the generalized octree method reduces memory space and process-ing time of the system compared to the aforementioned methods under the comparable silhouette inconsistency condition.

To our best knowledge, Erol et al. are the only group concerned with the conditional gray octant split problem[36]. They proposed a condition for determining whether a boundary (i.e., gray) octant is split further or not. The condition depends on the length of the diagonal of bounding rectangle of the voxel projection and the minimum local feature size (LFS) value inside the bounding rectan-gle. These two values change with the voxel under consideration, so the boundary voxel’s split decision is not simple. In our method the boundary voxel’s split depends on a ﬁxed projection error bound P, so it is much easy to verify and the error bound parameter can be set according to the level of detail wanted. From this point of view the octree reconstruction of our method is generally faster than the adaptively sampling method.

2.3. A dynamic reﬁnement of the reconstructed octree model with respect to a decreasing error bound parameter P

After the octree model is reconstructed by the new reconstruc-tion method with a given octant projecreconstruc-tion error bound P, the resultant octree model can be dynamically updated without the need of restarting from scratch when the parameter P changes to a smaller value. Namely, for the new smaller parameter P only those gray–black and gray–white octants associated with an oc-tant’s XOR projection error greater than the new P value are needed to split. In this way the octree model can be dynamically reﬁned. The reﬁnement process can be recorded as the split of a parent gray–black or gray–white into eight child octants each with a smaller octant’s XOR error. The forward and backward pointers can be used to log the octant split history.

2.4. A progressive transmission of the reconstructed octree model with a best-ﬁrst traversal scheme

In the octree reconstruction process the XOR projection error of each generated octant is evaluated. The information on the projec-tion error can be used not only for the dynamic reconstrucprojec-tion (as described above) but also for progressive transmission. For trans-mission of the reconstructed object model, a priority queue, called

priority_quene, is used to store the sorted list of the unprocessed octants according to the decreasing order of the octant’s XOR pro-jection error. Each time when an octant subdivision is requested, the ﬁrst octant in the priority_quene, whose projection error value is currently the largest, is fetched for subdivision and marked as a processed octant with the associated projection error bound P. After the subdivision the eight sub-octants are inserted into the priority_queue based on their XOR projection error values. Also, the octant split history of the progressive model reconstruction is recorded by the forward and backward pointers.

During the model transmission only the gray–black octants and gray–white octants are required to be transmitted for rendering. The progressive transmission is terminated when the ﬁrst octant in the priority_quene has an error value no greater than the speci-ﬁed error bound P.

2.5. Surface mesh representation and texture mapping

In the progressive transmission mode the reconstructed octree model and one or more multi-view images are transmitted to the client. Then a marching cubes technique is used on the client to ob-tain the exact intersection of each projected octant edge with the silhouette in each image, if there is an intersection. Among the cor-responding intersection points found from all images the one clo-set to its silhouette in all views is chosen as the ﬁnal exact point on the visual hull surface. In this way the octant-based volumetric representation is converted into a surface representation in a mesh form. The accuracy of our reconstructed visual hull mesh is

(a)

(b)

Fig. 5. (a) The bounding spheres for the parent and child octants at two consecutive levels. (b) The top view of the parent and eight child octant images shown with a bounding circle diameter and a number of object cutting edges.

Fig. 6. The hardware setup of the multi-view capturing system with a turntable connected to a PC.

(6)

basically similar to that of the ‘‘Conv + ExMC” method except that the former starts with a generalized octant model and the latter a conventional octant model.

Cautions are exercised to prevent the reconstructed mesh from cracking. In a reconstructed 3D model by our method, the neigh-boring octants may be in different levels and in different sizes. In other words, our octant model is not the same as the conventional one which contains the gray octants at the ﬁnal subdivision level only. Cracks occur when the marching cubes method is applied to octants at different resolution levels. In order to prevent cracks, currently we split non-terminal octants at lower levels into descendant octants all the way to the ﬁnal level prior to the appli-cation of the marching cubes technique, although it may not be necessary. In this way no cracks will occur. A Poisson surface reconstruction algorithm (PSR) developed by Kazhdan et al.[33]

can be employed to produce a fairly smooth visual hull mesh. However, PSR is a time consuming technique and is not suitable for the real-time 3DTV applications.

For a single-user client only one rendered view is required for the given user viewing angle. Choosing the one or two images cap-tured around the user viewing angle would be appropriate for tex-ture mapping at the client end. Given the viewing angle, the UV texture mapping coordinates of each vertex on a visible triangular mesh can be computed using the known camera projection matrix. Texture mapping can be done in less than 30 ms by many off-the-shelf graphic cards.

3. Properties of the new generalized octree reconstruction method

In the following the notation GGl,Cstands for a single gray–gray

octant generated at level l in the conventional method or it stands

for the entire set of all gray–gray octants generated at level l, depending on the context. The corresponding notation in the new method is denoted as GGl,N. The notations for other octants

are similarly deﬁned.

For the object reconstruction one has to specify the dimensions of the root octant (a cube) based on the input image resolution. One can compute the location of any octant inside the root octant given the dimension of the root octant and use the known camera projection matrix to compute the size of the octant’s projection im-age. The bounding circles of the octants generated at the same level change slightly in size due to their different locations inside the root octant.

From the empirical experience, one often ﬁnds a typical octant subdivision outcome as follows. Starting from the root octant at le-vel 0, eight or a slightly fewer number of gray sub-octants are gen-erated at level 1. The new gray octants gengen-erated are recursively subdivided. When the level is sufﬁciently large (e.g., level > 4) any gray octant will be generally subdivided into 4 gray, 2 black and 2 white child octants. This pattern is called the 4–2–2 subdivi-sion pattern hereafter.

Deﬁnition 1. The (object) octree memory space of the conven-tional method is determined by the space required to store the total number of black octants at all levels,PLC

l¼1jBl;Cj, and the total

number of gray octants at the ﬁnal level LC, jGGLC;Cj. The object

reconstruction time of the conventional method is deﬁned as the total number of octants generated by the conventional method at all levels. (Namely the total octant number determines the actual octree reconstruction time.)

It is difﬁcult to derive the precise object memory size and the object octree reconstruction time of the old and new reconstruc-tion methods, but it is desirable to have a rule of thumb for

Fig. 7. Three real objects used in the experiments: (a) a face, (b) a dinosaur, and (c) a ﬁgure sculpture.

(7)

calculating these ﬁgures for performance evaluation. A basic assumption made here is the 4–2–2 subdivision pattern in the con-ventional method, as explained below.

When the subdivision level is sufﬁciently large, any gray–gray octant will have a small volume and the real object surface inter-sects the gray–gray octant in a random fashion; half of the octant lies inside and half lies outside the real object. Also, half of the child octants will be white or black and half of the child octants are gray–gray, as illustrated inFig. 5(b). As a consequence, the num-bers of gray–gray, white and black child octants have a ratio of 4:2:2. And the area of the new XOR error due to the gray–gray

Table 2

The statistics of octants generated from the real dinosaur dataset by two different reconstruction methods with their speciﬁed system parameters.

Octant generation level Conventional method New method

LC= 7, XOR = 127,757 P = 42, XOR = 103,790 |B| |GB| |GG| |GW| |W| |B| |GB| |GG| |GW| |W| 0 0 0 1 0 0 0 0 1 0 0 1 0 0 4 0 4 0 0 4 0 4 2 0 0 14 0 18 0 0 14 0 18 3 0 0 59 0 53 0 0 39 20 53 4 0 0 202 0 270 0 0 130 71 111 5 21 0 800 0 795 21 86 205 445 283 6 407 0 3107 0 2886 45 1138 0 396 61 7 3266 0 12,392 0 9198 8 LC= 8, XOR = 66,228 P = 18, XOR = 45,918 0 0 0 1 0 0 0 0 1 0 0 1 0 0 4 0 4 0 0 4 0 4 2 0 0 14 0 18 0 0 14 0 18 3 0 0 59 0 53 0 0 49 10 53 4 0 0 202 0 270 0 0 169 33 190 5 21 0 800 0 795 21 27 540 226 538 6 407 0 3107 0 2886 217 496 1131 1324 1152 7 3266 0 12,392 0 9198 433 4664 0 3344 607 8 18,934 0 48,086 0 32,116 9 LC= 9, XOR = 33,852 P = 12, XOR = 32,961 0 0 0 1 0 0 0 0 1 0 0 1 0 0 4 0 4 0 0 4 0 4 2 0 0 14 0 18 0 0 14 0 18 3 0 0 59 0 53 0 0 52 7 53 4 0 0 202 0 270 0 0 185 17 214 5 21 0 800 0 795 21 15 637 148 659 6 407 0 3107 0 2886 295 307 1801 959 1734 7 3266 0 12,392 0 9198 1178 3529 1818 5496 2387 8 18,934 0 48,086 0 32,116 489 11,405 0 2448 202 9 88,349 0 182,434 0 113,905 10 LC= 10, XOR = 20,821 P = 7, XOR = 20,405 0 0 0 1 0 0 0 0 1 0 0 1 0 0 4 0 4 0 0 4 0 4 2 0 0 14 0 18 0 0 14 0 18 3 0 0 59 0 53 0 0 56 3 53 4 0 0 202 0 270 0 0 193 9 246 5 21 0 801 0 794 21 8 704 89 722 6 410 0 3109 0 2889 347 153 2361 593 2178 7 3260 0 12,422 0 9190 2092 1931 6396 3758 4711 8 18,935 0 48,180 0 32,261 5733 16,991 2450 19,995 5999 9 88,425 0 182,990 0 114,025 1355 15,934 0 2156 155 10 397,159 0 600,509 0 466,252 Table 3

The diameter range of the bounding circles of the projected octant images at all levels. Range (unit: pixel) Level

1 2 3 4 5 6 7 8 9 10

Dmin,l, Dmax,l [1590, 3176 ] [698, 1398] [350, 676] [172, 336] [90, 166] [46, 80] [22, 42] [12, 22] [6, 8] [2, 8]

Table 4

The tabulation of the XORC;LC errors, the average ðXORC;LCþ XORC;LCþ1Þ=2; memory space, reconstruction time of the conventional method with level Lc = 6, . . . , 9, respectively. The last row gives the error bound value P of the new method with XOR model error comparable to the conventional one at the respective level.

LC 6 7 8 9 10

XORC;LC 233,831 127,757 66,228 33,852 20,821

ðXORC;LCþ XORC;LCþ1Þ=2 180,794 96,993 50,040 27,337 –

MC;LC(in log scale) 3.55 4.21 4.85 5.47 6.04

TC;LC(in log scale) 3.94 4.53 5.12 5.71 6.30

(8)

sub-octant images is roughly reduced to about half of that of the parent gray–gray octant after the subdivision level is raised by one. Therefore, the following property, Property 1, has been established:

Property 1. When the subdivision level becomes sufﬁciently large, a gray–gray parent octant is split into child gray–gray, white and black child octants in a ratio of 4:2:2

Property 2. At a sufﬁciently large ﬁnal subdivision level of the con-ventional method the octree memory space and reconstruction time both increase approximately by four times as the level increases by one.

Property 3. At a sufﬁciently large ﬁnal subdivision level of the con-ventional method the XOR projection error reduces roughly by half as the level increases by one.

The proofs ofProperties 2, 3and the followingProperty 4are provided inAppendix A.

In order to compare the system performance of the reconstruc-tion methods, a noreconstruc-tion of reconstrucreconstruc-tion model accuracy compara-bility between the different methods is deﬁned below:

Deﬁnition 2. Let XORLCbe the projection error of the octree model

(see Eq.(1)) obtained by the conventional method with a ﬁnal level LC. If the new method with a given XOR projection error bound

P has an XOR projection error lying in the range from ðXORLC1þ XORLCÞ=2 to ðXORLCþ XORLCþ1Þ=2, then these two

meth-ods are said to have the comparable reconstruction model accuracy (or silhouette inconsistency).

Deﬁnition 3. The object memory space of the new method is deter-mined by the space to store the sets of black octants and gray–black octants at all levels generated in the method. The object reconstruc-tion time of the new method is determined by the total number of all ﬁve kinds of octants generated at all levels in the method.

Property 4. The XOR model accuracy of the new method is nearly a non-decreasing function of the error bound parameter P except occa-sionally at a few ranges of P. On the contrary, the XOR model accuracy of the conventional method is a non-decreasing function of the subdi-vision level.

Property 5. Under the comparable XOR model accuracy condition the ratio of the object memory size of the conventional method to that of the new method, MC;LC=MN;P, is found empirically to be in the interval

of [12, 80]. Similarly, the ratio of the object reconstruction time of the conventional method to that of the new method, TC;LC=TN;P, is found

empirically to be in the interval [10, 40].

From above the new method has a better memory and time complexity than the conventional method under the comparable XOR model accuracy. This is mainly due to the introduction of the new octant types of gray–white and gray–black octants.

180794

96993

50040 27337

Fig. 9. The (dense) plot of the XOR projection error vs. the error bound parameter P. The horizontal dash lines indicate the average value ðXORC;LCþ XORC;LCþ1Þ=2 and the

vertical dash lines indicate the corresponding error bound values P of the new method with an XOR model error comparable to that of the conventional method. The corresponding error bound values of P as indicated on the horizontal axis are given inTable 4.

Fig. 10. (a) Plots of the memory size and reconstruction time (in the logarithm scale) vs. parameter P of the new method. (b) Plots of memory and reconstruction time compression ratios between the two reconstruction methods under the comparable XOR model accuracy.

Table 5

Comparison between the XOR model errors of the three reconstruction methods under a similar triangle number.

David (19 images) Dinosaur (36 images) Dancer (20 images)

Triangles XOR error Err(O, T) Triangles XOR error Err(O, T) Triangles XOR error Err(O, T)

Conv + MC 27,368 21,561 6.525% 62,616 147,671 7.139% 10,198 90,665 6.632%

Conv + ExMC 27,368 21,244 6.481% 62,616 120,586 5.830% 10,198 75,594 5.530%

(9)

However, there is a price to pay for the performance improve-ment. That is, when the XOR projection bound P is greater than the half size of the most elongated part (i.e., the ﬁnest detail) in the object model, and then it is possible that the reconstructed ob-ject model given by the new method may have a missing part. The missing part can be avoided by shifting the root octant or using a smaller value P in the new octree reconstruction method. In prac-tice, the value P can be decided according to the projected size of the ﬁner part in the object model.

Table 6

Comparison between the numbers of triangles of the three reconstruction methods under a comparable accuracy.

David Dinosaur Dancer

Err(O, T) XOR error Triangles Err(O, T) XOR error Triangles Err(O, T) XOR error Triangles

Conv + MC 8.007% 26,787 6,658 7.139% 147,671 62,616 5.572% 76,178 42,200

Conv + ExMC 7.367% 24,100 6,658 6.677% 138,103 15,192 5.530% 75,594 10,198

Ours 7.180% 23,189 6,350 6.089% 125,946 16,734 5.406% 73,897 10,897

Table 7

Execution time (in milliseconds) comparison of the three reconstruction methods under a similar triangle number.

David Dinosaur Dancer

Triangles 27k 62k 40k

Conv + MC 1360 4078 1765

Conv + ExMC 1453 4204 1728

Ours 750 1390 624

* Running on Intel Quad Core 2.83GHz Processor with 3GB Ram

Fig. 11. Reconstructed model representations for the David sequence. (a) Mesh representation, (b) 3D shaded representation, (c) texture mapping representation, (d) one of the original images.

Fig. 12. The rendering results with texture mapping of the reconstructed dinosaur image sequence. (a) Conv + MC, (b) Conv + ExMC, and (c) our method. The results are found under a similar number of triangles.

(10)

4. Experimental results

To evaluate the proposed reconstruction method, we design several experiments on a variety of synthetic and real objects of different geometric complexity. The ﬁrst experiment deals with real objects and the second experiment reconstructs 3D objects from synthesized images in different image resolutions. The third experiment shows the progressive reconstruction results of a real human frozen by the multi-view capturing system. We also dem-onstrate an augmented reality (AR) application using the recon-structed 3D models.

Fig. 6depicts our hardware system including a turntable con-nected to a PC. The camera captures a sequence of images (10– 36) of the real object resting on a rotating turntable under the con-trol of a PC. The whole reconstruction program is written in VC++ under the Windows environment. Some typical real objects used are shown inFig. 7and the new views generated from the recon-structed 3D models are given inFig. 8.

4.1. Real object reconstructions

To analyze the reconstruction results, 10 images of a real dino-saur resting on a turntable are taken in the experiment. Each image has a resolution of 2700 1800 pixels. The main intermediate re-sults of the conventional method and the new method are col-lected below to show their performance differences.

Table 2is the tabulation of the numbers of all types of octants generated by the conventional method with a speciﬁed subdivision level (LC= 7, 8, 9, and 10) and by the new reconstruction methods

specified with a comparable XOR octant projection error bound values (P = 42, 18, 12, and 7). The other fields in the table include ‘‘octant generation level” indicating the level when the octants are generated, and the sizes of the five generated octant sets: |B|, |GB|, |GG|, |GW|, and |W|. From this table one can see the subdivi-sion patterns of gray–gray octants into various sub-octants during the whole subdivision process in these two different reconstruc-tion methods. One can check the important properties of these

Fig. 13. The rendering results with shading of the reconstructed dinosaur image sequence. (a) Conv + MC, (b) Conv + ExMC, and (c) our method.

Fig. 14. The rendering results with texture mapping of the reconstructed dancer image sequence. (a) One of the original images (b) Conv + MC, (c) Conv + ExMC, and (d) our method. The results are under a similar number of triangles.

(11)

two methods described in the previous section including (i) the rel-ative sizes of the various sets of octants of different methods at each level, (ii) the ﬁnal level numbers of the new methods relative to that of the conventional method, and (iii) the XOR errors at all levels in these methods.

The root octant used in this experiment has a dimension of 50 cm3_{. The root octant is placed at the center of the turntable}

where the rotation axis passes through. As mentioned previously, any octant at level 0, 1, 2, . . . , LCis approximated by a

correspond-ing boundcorrespond-ing sphere. The projection of this sphere onto the image plane of each camera can found using the camera projection ma-trix. The size of the projected circle varies with the sphere location in the space. The ranges of the minimum and maximum diameter [Dmin,l, Dmax,l] of the projected circle can be obtained in advance.

They are listed inTable 3.

These minimum and maximum values indicate the minimum and maximum projection error of the ﬁnal object model obtained at the level l by the conventional reconstruction method. They will

Fig. 15. The mesh rendering results of the reconstructed dancer image sequence. (a) Conv + MC, (b) Conv + ExMC, and (c) our method.

Fig. 16. Progressive reconstruction results of the octree model creation at the transmission instants corresponding to an XOR octant projection error bound P value equal to (a) P = 13, (b) P = 7, and (c) P = 4.

(12)

be used to predict the feasible XOR octant projection error bounds P used in the new reconstruction method.

The XOR model accuracy of the two reconstruction methods are given inTable 4andFig. 9, respectively. The former is a discrete function of the ﬁnal subdivision levels and the latter is a dense plot with respect to the error bound parameter P. InTable 4, the aver-age values of the XOR model errors at any two consecutive levels are computed, as shown by the diamond marks along the vertical axis inFig. 9. From these average values the selection of the corre-sponding XOR error bound values of P can be found, as indicated on the horizontal axis of Fig. 9. These correspondence data will be used below to compute the memory compression ratio and the time compression ratio between the conventional and new recon-struction methods for their performance evaluations. Recall that if the new method with an error bound P has an XOR model error

lying in the range from ðXORLC1þ XORLCÞ=2 to ðXORLCþ

XORLCþ1Þ=2, then the new method and the conventional method

are said to have the comparable reconstruction accuracy. The logarithm values of memory space and reconstruction time for the conventional method at level Lc = 6, 7, . . . , 10 are given in

Table 4. In Fig. 10(a) the logarithm values of the memory space (in red) and the reconstruction time (in blue) of the new method are listed for the range of P values; the dash vertical lines indicate those P values given in Table 4. Based on the data shown in

Fig. 10(a) one can compute the memory compression ratio and the time compression ratio between the conventional and new methods, as shown inFig. 10(b). In this experiment the memory compression ratio falls in the range of [12, 80] and the time com-pression ratio falls in the range of [10, 40]. We also conducted the octree reconstruction using the images at higher image

Fig. 18. (a) One of the original images; (b) the ﬁnal textured mapped visual hull mesh inFig. 17(a) after applying exact marching cubes technique as well as texture mapping; (c) the processing time of the system components and the total numbers of vertices and triangular faces, respectively.

Fig. 19. Free viewpoint visualization by making use of ARToolKit.

(13)

resolution and observe that the ranges of memory and time com-pression ratios almost do not change with the image resolution.

In the next experiment we acquire from the public website three image sequences: David (19 images), dinosaur (34 images), and dancer (20 images)[37]. We implement three methods for the reconstruction of the subjects in these image sequences. They are the ‘‘Conv + MC” method[29], the ‘‘Conv + ExMC” method or, simply, ExMC [35], and our generalized octree reconstruction method with the exact marching cubes intersections.Tables 5–7

give the comparisons of the XOR model error, the number of trian-gles, and the execution time of the three methods under an associ-ated comparability condition, respectively. From these tables one can see that the performance of our method ranks ﬁrst, Con-v + ExMC second, and ConCon-v + MC last. The merit gap widens as the level of detail of the subject increases.

Figs. 11–15show the reconstruction results of the three meth-ods for the three image sequences in various model representa-tions. The dinosaur models in Figs. 12(c) and 13(c) are locally different from the others. As indicated inTable 5our reconstructed dinosaur result has a smaller XOR error value than those of the two other methods; therefore, our method achieves a better recon-struction quality. Generally speaking, our methods allows gray– black and gray–white octants so that our reconstructed visual hull approximates the true object from both inside and outside of the object surface, while other methods generally start with a bound-ing volume of the true object with more non-terminal octants prior to the marching cubes method. We can use different control parameter values of P to generate thicker or thinner reconstruction results.

4.2. Progressive reconstruction from real human images

The new reconstruction method combined with progressive transmission mechanism mentioned previously is used to model a real human in the multi-view capturing system.Fig. 16(a)–(c) show the progressive transmission of the reconstruction results using the best-ﬁrst scheme and the projection errors of the gener-ated octants are ranked in the decreasing order.Fig. 17shows pro-gressive reconstruction results. Fig. 18 shows the progressive reconstruction results with texture mapping. Our programs run on an Intel Quad Core PC with 3 GB RAM, but without any code optimization.

4.3. Applications to augmented reality

The system can use the augmented reality software, ARToolKit

[34], to visualize the reconstructed 3D models in a more interest-ing way. The interactive AR application implemented on the host PC detects all possible patterns in the video frame and displays the correct motion sequences of baseball characters in accordance with the 2D markers captured by the camera. Thus, the user can hold and rotate the marker cards to enjoy free viewpoint visualiza-tion, as shown inFig. 19.

Fig. 20. shows the designed interactive baseball game in which different reconstructed baseball characters are associated with

dif-ferent marker cards. Each character has his own motion such as the pitching or the batting. All the motions are pre-recorded and can be replayed according to their recording time line, as shown in

Fig. 21.

5. Conclusion

This paper presents a multi-view capturing and processing sys-tem with a new generalized octree reconstruction method to gen-erate an object visual hull. The conventional gray octants are reﬁned into gray–gray, gray–black, and gray–white octants in the new method. The octree model produced by the new method ap-proaches the real object from both the outer and inner object boundaries. The 3D model reconstructed provides good object model and photorealistic rendering effect. With the progressive transmission, users can get a quick preview of 3D object under reconstruction and obtain a reﬁned 3D object as time goes on.

For interactive 3DTV applications, our experimental results show that it is possible to reconstruct the 3D object models in real time with a small number of PCs running in parallel. For the multi-view images transmission between the server and a client, only one or two images are needed to transmit for texture mapping at the client end.

Experimental results show that under the comparable XOR projection error constraint using our generalized reconstruction method can reduce the memory space required by the conven-tional method by a factor of 12–80 and the octree reconstruction time required by the conventional method by a factor of 10–40. Also, the experimental reconstruction results obtained from three image sequences available in the public websites indicate that our method can speed up the processing time by a factor of 2– 3, when compared with the ‘‘Conv + MC” method and the ‘‘Con-v + ExMC” method under a comparable silhouette inconsistency constraint.

Currently, our system is undergoing an optimization of its code to accelerate the processing speed to meet more stringent applica-tion need. Also, multi-view stereo registraapplica-tion is studied to do the dense reconstruction using the visual hull acquired as an initial ob-ject model.

Acknowledgments

We would like to express our gratitude to reviewers for their constructive comments which lead to the improvement of the paper. This work was supported by the Industrial Technology Research Institute of Taiwan (97C078, 98C052) and National Science Council of Taiwan (NSC97-2221-E-009-145).

Appendix A. Proofs of Properties 2, 3, and 4

Property 2. At a sufﬁciently large ﬁnal subdivision level of the conventional method the octree memory space and reconstruction time both increase approximately by four times as the level increases by one.

(14)

A.1. Proof

When the subdivision level is sufﬁciently large, under the 4–2– 2 subdivision pattern assumption, the object memory size and reconstruction time at a given ﬁnal level LCare given by

MLC¼ jGGLC;Cj þ XLC l¼1 jBl;Cj ffi jGGLC;Cj þ jBLC;Cj ffi 6jGGLC1;Cj ð3Þ TLC¼ XLC l¼1 jGGlj þ XLC l¼1 jBlj þ XLC l¼1 jWlj ffi jGGLCj þ jBLCj þ jWLCj ¼ 8jGGLC1j ð4Þ

Therefore, the octree memory space and reconstruction time change with the level as follows:

MLC=MLCþ1ﬃ 6jGGLC1;Cj=6jGGLC;Cj ¼ 4; and

TLC=TLCþ1ﬃ 8jGGLC1j=8jGGLCj ¼ 4

Property 3. At a sufﬁciently large ﬁnal subdivision level of the conventional method the XOR projection error reduces roughly by half as the level increases by one.

A.2. Proof

The XOR projection error of the ﬁnal object octree model in each view is due to the 2D projection errors of the gray–gray octants along the object silhouette contour in each 2D view. The 2D image of each gray–gray octant is approximated by a bounding circle. The sizes of octant images at each level are roughly equal due to the small variation of the camera-to-octant distances. The XOR projec-tion error of the convenprojec-tional method is mainly caused by the gray–gray octant set at the highest level.

Furthermore, since half the parent gray–gray octant lies outside the object, so the area of the XOR error image caused by this out-side part of the parent octant is roughly one half of the total area of the parent octant projected image. Similarly, the 4:2:2 subdivision pattern of the parent gray–gray octant implies that the new area of the XOR error image associated with the four gray–gray sub-oc-tants is also roughly one half of the area of the XOR error image of the parent gray–gray octant. That is, the XOR error reduces by half as the subdivision level increases by one. h

Property 4. The XOR model accuracy of the new method is nearly a non-decreasing function of the error bound parameter P except occasionally at a few short ranges of P values. On the contrary, the XOR model accuracy of the conventional method is a non-decreasing function of the subdivision level.

A.3. Proof

Tin the new method the presence of the gray–white octants in the new method causes the octree geometry to shrink, while the presence of the gray–black octants causes the octree geometry to expand. For some particular P value when it slightly increases, cer-tain gray–black or gray–white octants may disappear. The absence of such an octant leads to the split of them into smaller gray–white and gray–black offspring. Some of these gray–white and gray– black octants have a projection image bounding size no greater than P; they can be categorized into either type. Depending on the inside and outside portions of these octants, the two different type assignments will produce different values of the XOR projec-tion error. Therefore, the resultant sum of the XOR projecprojec-tion er-rors of these offspring may or may not be smaller than that of their ancestor. This is the reason why the XOR model accuracy is a non-decreasing function of the error bound parameter P.

The above situation does not happen in the conventional meth-od, since there are no gray–black and gray–white octants. All gray octants are viewed as the black ones at the end of the subdivision process. Thus, the ﬁnal XOR model error never decreases as the subdivision level increases. h

References

[1] E. Stoykova, A. Alatan, P. Benzie, N. Grammalidis, S. Malassiotis, J. Ostermann, S. Piekh, V. Sainov, C. Theobalt, T. Thevar, X. Zabulis, 3D time-varying scene capture technologies – a survey, IEEE Transactions on Circuits and Systems for Video Technology 17 (11) (2007) 1568–1586.

[2] A. Alatan, Y. Yemez, U. Güdükbay, X. Zabulis, K. Müller, C. Erdem, C. Weigel, A. Smolic, Scene representation technologies for 3DTV – a survey, IEEE Transactions on Circuits and Systems for Video Technology 17 (11) (2007) 1587–1605.

[3] Y. Mori, N. Fukushima, T. Yendo, T. Fujii, M. Tanimoto, View generation with 3D warping using depth information for FTV, Signal Processing: Image Communication 24 (1) (2009) 65–72.

[4] M. Tanimoto, FTV (Free Viewpoint Television) for 3D scene reproduction and creation, in: Workshop IEEE Conference Computer Vision and Pattern Recognition, New York, NY, 2006, p. 172.

[5] T. Fujii, T. Tanimoto, Free-viewpoint TV system based on ray-space representation, Proceedings of the SPIE 4864 (2002) 175–189.

[6] B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, M. Levoy, High performance imaging using large camera arrays, ACM Transactions on Graphics 24 (3) (2005) 765–776.

[7] C.L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winder, R. Szeliski, High-quality video view interpolation using a layered representation, ACM Transactions on Graphics 23 (3) (2004) 600–608.

[8] J.-G. Lou, H. Cai, J. Li, A real-time interactive multi-view video system, in: Proceedings, ACM Multimedia, Singapore, 2005, pp.161–70.

[9] Y. Furukawa, J. Ponce, Accurate, dense, and robust multi-view stereopsis, in: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, 2007, pp. 1–8.

[10] Y. Furukawa, J. Ponce, Accurate camera calibration from multi-view stereo and bundle adjustment, International Journal of Computer Vision 84 (3) (2009) 257–268.

[11] C. Hernandez, F. Schmitt, Silhouette and stereo fusion for 3D object modeling, Computer Vision and Image Understanding 96 (3) (2004) 367–392. [12] C. Hernández, G. Vogiatzis, R. Cipolla, Multi-view photometric stereo, IEEE

Transactions on Pattern Analysis and Machine Intelligence 30 (3) (2008) 548– 554.

[13] M. Habbecke, L. Kobbelt, A surface-growing approach to multi-view stereo reconstruction, in: Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, 2007, pp. 1–8.

[14] M. Goesele, B. Curless, S.M. Seitz, Multi-view stereo revisited, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, 2006, pp. 17–22.

[15] S. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski, A comparison and evaluation of multi-view stereo reconstruction algorithms, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, 2006, pp. 519–5268.

[16] F. Farbiz, A.D. Cheok, L. Wei, Z. Zhou, K. Xu, S. Prince, M. Billinghurst, H. Kato, Live three-dimensional content for augmented reality, IEEE Transactions on Multimedia 7 (3) (2005) 514–523.

[17] T.H.D. Nguyen, T.C.T. Qui, K. Xu, A.D. Cheok, S.L. Teo, Z. Zhou, M. Asitha, S.P. Lee, W. Liu, H.S. Teo, L.N. Thang, Y. Li, Real time mixed reality 3D human capture system for interactive art and entertainment, IEEE Transactions on Visualization and Computer Graphics 11 (6) (2005) 706–721.

[18] T. Kanade, P. Rander, P.J. Narayanan, Virtualized reality: constructing virtual worlds from real scenes, IEEE Transactions on Multimedia 4 (1) (1997) 34–47. [19] H. Kim, R. Sakamoto, I. Kitahara, N. Orman, T. Toriyama, K. Kogure, Compensated visual hull for defective segmentation and occlusion, in: Proceedings of the International Conference on Artiﬁcial Reality and Telexistence, Esbjerg, Denmark, 2007, pp. 210–217.

[20] H.L. Chou, Z. Chen, Fast octree reconstruction endowed with an error bound controlled subdivision scheme, Journal of Information Science and Engineering 22 (2006) 641–657.

[21] J.-S. Franco, E. Boyer, Exact polyhedral visual hulls, in: British Machine Vision Conference, Norwich, UK, 2003, pp. 329–338.

[22] J. Starck, A. Hilton, Surface capture for performance-based animation, IEEE Computer Graphics and Applications (2007) 21–31.

[23] D. Vlasic, I. Baran, W. Matusik, J. Popovic, Articulated mesh animation from multi-view silhouettes, ACM Transactions on Graphics 27 (3) (2008) 97:1– 97:9.

[24] E. de Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H.-P. Seidel, S. Thrun, Performance capture from sparse multi-view video, ACM Transactions on Graphics 27 (3) (2008) 98:1–98:10.

[25] A. Ladikos, S. Benhimane, N. Navab, Efﬁcient visual hull computation for real-time 3D reconstruction using CUDA, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop, Anchorage, AK, 2008.

(15)

[27] R. Szeliski, Rapid octree reconstruction from image sequences, Computer, Vision Graphics and Image Processing: Image Understanding 58 (1) (1993) 23– 32.

[28] J.Y. Bouguet, Camera calibration toolbox for Matlab. Available from:<http:// www.vision.caltech.edu/bouguet/calib_doc/>.

[29] W.E. Lorensen, H.E. Cline, Marching cubes – a high resolution 3D surface reconstruction algorithm, in: Proceedings of the SIGGRAPH, 1987, pp. 163– 169.

[30] R. Zhang, S. Zhang, S. Yu, Moving objects detection method based on brightness distortion and chromaticity distortion, IEEE Transactions on Consumer Electronics 53 (3) (2007) 1177–1185.

[31] T. Horprasert, D. Harwood, L.S. Davis, A statistical approach for real-time robust background subtraction and shadow detection, in: Proceedings of the IEEE International Conference on Computer Vision, Kerkyra, Greece, 1999, pp. 1–19.

[32] K.E. Hoff, J.Keyser, M. Lin, D. Manocha, T. Culver, Fast computation of generalized Voronoi diagrams using graphics hardware, in: Proceedings of

the International Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, 1999, pp. 277–286.

[33] M. Kazhdan, M. Bolitho, H. Hoppe, Poisson surface reconstruction, in: Symposium on Geometry Processing, Sardinia, Italy, 2006, pp. 61–70. [34] H. Kato, ARToolKit. Available from: <http://www.hitl.washington.edu/

artoolkit/>.

[35] C. Liang, K.-Y.K. Wong, Exact visual hull from marching cubes, in: Proceedings, International Conference on Computer Vision Theory and Applications, 2008, pp. 597–604.

[36] A. Erol, G. Bebis, R.D. Boyle, M. Nicolescu, Visual hull reconstruction using adaptive sampling, in: Proceedings of the IEEE Workshops on Application of Computer Vision, 2005, pp. 234–241.

[37] Yebin Liu, Qionghai Dai, Wenli Xu, Free viewpoint video data sets. Available from:<http://media.au.tsinghua.edu.cn/fvv.jsp>.

[38] Y. Yemez, F. Schmitt, Multilevel representation and transmission of real objects with progressive octree particles, IEEE Transactions on Visualization and Computer Graphics 9 (4) (2003) 551–569.