主動視覺式三維掃描儀之建置與評估

全文

(1)國立高雄大學資訊工程學系研究所碩士論文. 主動視覺式三維掃描儀之建置與評估 Design, implementation and evaluation of an active vision-based 3-d scanner. 研究生：簡祥任撰指導教授：陳佳妍博士中華民國九十九年七月.

(2) 主動視覺式三維掃描儀之建置與評估. 指導教授：陳佳妍博士國立高雄大學資訊工程學系. 學生：簡祥任國立高雄大學資訊工程學系碩士班. 中文摘要隨著三維運算科技及其應用迅速的發展，對物件形體之數位化工作的需求與日俱增。在電腦視覺領域與測量工程中，建立低價高效益之掃描儀器一直以來是重要的研究課題。其中又以主動式光學掃描技術為主流之一，而結構光法（ structured light method）即是一種廣泛使用的三維測量技術。使用影像感測器擷取、處理並分析投影光型（light pattern），結構光技術能快速的建立細緻的表面模型。傳統的結構光方法仍有諸多待改善之技術細節，例如校正程序與光型解碼等問題。基於結構光法，本研究建立一具高精度的三維掃描儀，並提出多項改善。改進之方向乃是針對二方面，即系統校正（calibration）與立體匹配性（stereo correspondences）的建立。在系統校正上，我們設計了一目標適性方法，將傳統結構光轉變為反饋式封閉迴路，使其能使用輸入影像動態生成校正樣板。在實驗結果中證明，該方法能大幅改進校正誤差且避免大量的光型投影與計算工作。另一方面，我們針對立體匹配性的建立提出改良。為能更精確的解碼投影光型，本研究設計了二元像素分類器，使用貝氏推論（ Bayesian inference）與核方法（kernel method）動態學習像素亮度比率之分佈模型。經過改善後，測量系統能達到次釐米精度之三維測量。. 關鍵字：三維重建、結構光、電腦視覺、距離測量、主動掃描、光學工程。. i.

(3) Design, implementation and evaluation of an active vision-based 3-d scanner. Advisor: Dr. Chia-Yen Chen Department of Computer Science and Information Engineering National University of Kaohsiung Student: Hsiang-Jen Chien Department of Computer Science and Information Engineering National University of Kaohsiung ABSTRACT The structured light method is an active vision-based surface recovery approach that provides a cost-efficient solution for 3-dimensional reconstruction. The performance of a structured light based scanner depends on the accuracy of system calibration as well as the acquisition of stereo correspondences. The conventional calibration of a structured light system requires the use of different targets to calibrate the camera and the light projector separately. In this work we proposed a novel method that utilizes just one target to finish the calibration. To overcome the identification problem of light patterns, the proposed method is able to generate calibration pattern in an adaptive manner. The experimental results show that the proposed method can achieve accurate calibration. The establishment of stereo correspondences is also key to precision measuring. To accelerate the acquisition of camera-projector pixel correspondences, a typical structured light scanner usually deploys binary-coded patterns. The extraction of binary light pattern can be problematic if the lighting condition is not carefully taken into account. We have designed a probabilistic pixel classification mechanism, which is expected to achieve more reliable identification of projected light patterns. The result of this research is a built low-cost scanner that is able to perform accurate 3-d measuring tasks. Keywords: 3-d reconstruction, structured light, shape recovery, ranging, computer vision, active scanning, camera-projector system, precise measurement, optical engineering, object digitization, digital archive. ii.

(4) Acknowledgment. I gratefully acknowledge Dr. Chia-Yen Chen, the advisor of this research, for her tireless support in teaching and guidance. She sets an excellent example not only in research but also in life. I would like to thank her family, Dr. Chi-Fa Chen, May Yang, and Ee Chen, for their hospitality twice offered during my stay in New Zealand. I would also like to thank Professor Reinhard Klette for his help in arranging my short-term visit to University of Auckland in 2008. To express my sincerest appreciation to people in Taiwan, the following acknowledgment is written in Chinese.. 我由衷的感謝每一位幫助這份研究順利完成的人們。感謝碩士班的同學、學長與學弟妹們，陪伴在我的研究生涯中並給予許多的幫助。對於知識上的教導與研究上的啟發，我十分感謝各位老師。感謝杜國洋教授與殷堂凱教授，不辭辛勞地擔任口試委員，在論文的撰寫上給予相當寶貴的建議。我也感謝系辦助理淑真姐在各方面提供諮詢，在「視覺與圖學實驗室」創立階段給予各種協助。最後謝謝我的家人，你們的支持、包容與鼓勵是使我完成論文的最大動力。. iii.

(5) Contents. 中文摘要.....................................................................................................................................i Abstract......................................................................................................................................ii Acknowledgment......................................................................................................................iii List of figures...........................................................................................................................vii List of tables..............................................................................................................................ix List of symbols...........................................................................................................................x Chapter 1 Introduction................................................................................................................1 1.1 Surface recovery techniques............................................................................................2 1.1.1 Shape from shading (SFS).......................................................................................2 1.1.2 Photometric stereo (PSM)........................................................................................3 1.1.3 Shape from contours (SFC).....................................................................................3 1.1.4 Shape/structure from motion (SFM)........................................................................3 1.1.5 Binocular stereo vision............................................................................................4 1.1.6 Structured light.........................................................................................................4 1.1.7 Modulated light........................................................................................................5 1.1.8 Laser rangefinder.....................................................................................................5 1.2 Structured light and its difficulties..................................................................................6 1.3 Motivation and expected results......................................................................................8 1.4 Thesis organization..........................................................................................................9 Chapter 2 Structured Light.......................................................................................................10 2.1 Mathematical background.............................................................................................10 2.1.1 Transformation from image coordinates to 3-d space............................................11 2.1.2 Projective space and homogeneous coordinates....................................................12 2.1.3 Coordinate system transformations........................................................................14 2.1.4 Reprojection and backprojection...........................................................................15 2.2 Surface recovery............................................................................................................16 2.2.1 Triangulation case 1: partial correspondences.......................................................16 2.2.2 Triangulation case 2: complete 2-d correspondences............................................19 2.2.3 Direct linear triangulation......................................................................................21 2.2.4 Error estimate and optimal triangulation...............................................................22 2.3 Calibration.....................................................................................................................23 2.3.1 Camera calibration.................................................................................................23 2.3.2 Projector calibration...............................................................................................25 2.3.3 Self calibration and recalibration...........................................................................27 2.4 Establishing correspondences........................................................................................28 2.4.1 Line Patterns..........................................................................................................30 2.4.2 Spot Patterns..........................................................................................................30 2.4.3 Stripe Patterns........................................................................................................30 iv.

(6) 2.4.4 Block Patterns........................................................................................................31 2.5 Variations.......................................................................................................................31 2.6 Summary........................................................................................................................32 Chapter 3 Target-adapted Scanner Calibration.........................................................................33 3.1 Adaptive calibration strategy.........................................................................................33 3.2 Tsai's single-view two-stage method.............................................................................36 3.2.1 Stage 1: recovery of rotation and part of translation..............................................38 3.2.2 Stage 2: recovery of focal length and translation...................................................41 3.3 Camera calibration.........................................................................................................41 3.3.1 Making of calibration board...................................................................................41 3.3.2 Feature extraction and clustering...........................................................................42 3.3.3 Rectification and auto-labeling..............................................................................44 3.3.4 Nonlinear minimization.........................................................................................47 3.4 Projector calibration......................................................................................................52 3.4.1 Pattern extraction and calibration target................................................................52 3.4.2 Correspondences from 2-d Gray pattern................................................................53 3.4.3 Feature pattern generation......................................................................................54 3.4.4 Feature extraction and correction...........................................................................56 3.5 Multiple view calibration...............................................................................................57 3.5.1 Fast generation of calibration pattern....................................................................57 3.5.2 Bundle adjustment.................................................................................................58 3.6 Experimental results......................................................................................................62 3.6.1 Linear calibration...................................................................................................65 3.6.2 Single-view optimization.......................................................................................67 3.6.3 Device-specific optimization.................................................................................69 3.6.4 System scope optimization....................................................................................71 3.6.5 Triangulation error estimate...................................................................................73 3.6.6 Planarity test...........................................................................................................74 3.6.7 Choice of markers..................................................................................................74 3.7 Summary........................................................................................................................75 Chapter 4 Improved Stereo Correspondences..........................................................................77 4.1 Correspondences from Gray-coded patterns.................................................................77 4.1.1 Pattern identification..............................................................................................79 4.1.2 Decoding................................................................................................................80 4.1.3 Visualization..........................................................................................................80 4.1.4 Combination with phase shift patterns...................................................................80 4.2 Enforcing monotonic mapping......................................................................................82 4.3 Adaptive pixel classification..........................................................................................85 4.3.1 Binarization problem.............................................................................................85 4.3.2 Probabilistic model................................................................................................86 4.3.3 Confidence indicator..............................................................................................88 4.3.4 Kernel method........................................................................................................89 4.3.5 Kernel-based online dichotomizer.........................................................................91 4.3.6 Experiments...........................................................................................................94 4.4 Confidence-based densification strategy.......................................................................96 4.5 Summary........................................................................................................................98 Chapter 5 System Evaluation.................................................................................................100 5.1 System setup................................................................................................................100 v.

(7) 5.1.1 Calibration............................................................................................................100 5.1.2 Scanning procedure..............................................................................................101 5.1.3 Post-processing....................................................................................................102 5.1.4 Test objects...........................................................................................................102 5.2 Test object 1: Mozart statue.........................................................................................103 5.3 Test object 2: Mr. White model...................................................................................108 5.4 Test object 3: Epson projector......................................................................................113 Chapter 6 Conclusion and Future Work.................................................................................118 6.1 Conclusion...................................................................................................................118 6.2 Future work..................................................................................................................118 6.2.1 Precise performance evaluation...........................................................................118 6.2.2 Integration of photometric and geometric error indicators..................................119 6.2.3 Automatic texture mapping..................................................................................119 6.2.4 Automatic calibration...........................................................................................120 6.2.5 Trinocular structured light....................................................................................120 6.2.6 Model compression..............................................................................................120 6.3 Possible applications....................................................................................................121 6.3.1 Digital archive......................................................................................................121 6.3.2 Reverse engineering.............................................................................................122 6.3.3 Augmented Reality...............................................................................................122 Bibliography...........................................................................................................................123. vi.

(8) List of figures. Fig. 1.1: Generalized model of structured light scanners...........................................................6 Fig. 1.2: Different ways of establishing camera-projector stereo correspondences...................7 Fig. 2.1: Geometry of a typical structured light system...........................................................11 Fig. 2.2: Illustration of a light plane decided by a line on projector screen.............................17 Fig. 2.3: Different calibration targets.......................................................................................24 Fig. 2.4: Comparison of selected works for projector calibration...........................................26 Fig. 3.1: Procedures of proposed calibration method...............................................................34 Fig. 3.2: Calibration board and extracted features...................................................................43 Fig. 3.3: Rectified control points..............................................................................................46 Fig. 3.4: Automatically labeled control points.........................................................................46 Fig. 3.5: Pseudo code of implemented camera optimizer........................................................51 Fig. 3.6: Checkerboard and contrast problem..........................................................................52 Fig. 3.7: Extension of Gray-coded stripes to two-dimensional................................................53 Fig. 3.8: Pattern generation procedure.....................................................................................55 Fig. 3.9: Target-adapted pattern for projector calibration........................................................56 Fig. 3.10: Jacobian matrix for multiple view optimization......................................................59 Fig. 3.11: Convergence test: bundle optimization and initial parameters................................60 Fig. 3.12: Convergence of reprojection error...........................................................................60 Fig. 3.13: Viewpoints (upper part) and projection of generated calibration patterns (lower part) of system Classical...........................................................................................................63 Fig. 3.14: Viewpoints (upper part) and projection of generated calibration patterns (lower part) of system Portable............................................................................................................64 Fig. 3.15: Linearly calibrated results of system (a) Classical and (b) Portable........................66 Fig. 3.16: Single-view optimized results of system (a) Classical and (b) Portable..................68 Fig. 3.17: Device-scope optimized results of system (a) Classical and (b) Portable...............70 Fig. 3.18: System-scope optimized results of system (a) Classical and (b) Portable...............72 Fig. 3.19: First two viewpoints with projection of circle markers...........................................75 Fig. 3.20: Triangulated points referred to calibrated system Classical....................................76 Fig. 3.21: Triangulated points referred to calibrated system Portable......................................76 Fig. 4.1: Plain and Gray-coded binary patterns........................................................................78 Fig. 4.2: Projection of Gray patterns on an object....................................................................78 Fig. 4.3: Algorithm of Gray decoder........................................................................................80 Fig. 4.4: Visualization of correspondences..............................................................................81 Fig. 4.5: Gray-coded patterns plus sinusoidal phase shift patterns..........................................82 Fig. 4.6: The strict ordering property and monotonic mapping...............................................83 Fig. 4.7: Recovery of miss-decoded correspondences.............................................................83 Fig. 4.8: Experimental result of enforcing the monotonicity of stereo correspondences.........84 Fig. 4.9: Increasing intensity of black stripes...........................................................................85 vii.

(9) Fig. 4.10: Visualized confidence maps.....................................................................................88 Fig. 4.11: Averaged confidence indicators of four scenes........................................................89 Fig. 4.12: Kernel method used to estimate an unknown probability distribution....................90 Fig. 4.13: Illustration of adaptive pixel classification..............................................................93 Fig. 4.14: Averaged confidence of pixel classification of tested scenes..................................95 Fig. 4.15: Objects (upper row) and their depth maps recovered using proposed pixel classifier (bottom row).............................................................................................................................95 Fig. 4.16: Heterogeneous resolutions between camera and projector......................................96 Fig. 4.17: Sampling problem caused by heterogeneous resolution..........................................97 Fig. 4.18: Significant saw-toothed artifacts are caused by unmatched resolutions..................97 Fig. 4.19: Surfaces recovered from original and densified correspondences...........................99 Fig. 5.1: System schematics of implemented 3-d scanner.....................................................101 Fig. 5.2: Estimated triangulation error of each scan of Mozart..............................................103 Fig. 5.3: Different views of Mozart........................................................................................104 Fig. 5.4: Recovered depth maps of Mozart (mm)..................................................................105 Fig. 5.5: Estimated error maps of Mozart (mm).....................................................................106 Fig. 5.6: Textured rendering of reconstructed Mozart............................................................107 Fig. 5.7: Estimated triangulation error of each scan of Mr. White.........................................108 Fig. 5.8: Different views of Mr. White..................................................................................109 Fig. 5.9: Recovered depth maps of Mr. White (mm)..............................................................110 Fig. 5.10: Estimated error maps of Mr. White (mm)..............................................................111 Fig. 5.11: Textured rendering of reconstructed Mr. White.....................................................112 Fig. 5.12: Estimated triangulation error of each scan of Epson.............................................113 Fig. 5.13: Different views of Epson.......................................................................................114 Fig. 5.14: Recovered depth maps of Epson (mm)..................................................................115 Fig. 5.15: Estimated error maps of Epson (mm)....................................................................116 Fig. 5.16: Textured rendering of reconstructed Epson...........................................................117 Fig. 6.1: A structured light scanner used to reconstruct objects for digital archive...............121. viii.

(10) List of tables. Table 1.1: Different vision-based configurations and related techniques or applications..........2 Table 2.1: Comparison between light patterns.........................................................................29 Table 3.1: Non-linear optimizations regarding to proposed calibration method......................61 Table 3.2: Equipments of experimental systems......................................................................62 Table 3.3: Test of linear calibration (pixel)..............................................................................65 Table 3.4: Test of locally optimized calibration (pixel)...........................................................67 Table 3.5: Test of device-specific optimization (pixel)............................................................69 Table 3.6: Test of system scope bundle adjustment (pixel)......................................................71 Table 3.7: Estimated triangulation errors of system Classical (mm).......................................73 Table 3.8: Estimated triangulation errors of system Portable (mm).........................................73 Table 3.9: Planarity errors of system Classical (mm)..............................................................74 Table 3.10: Planarity errors of system Portable (mm)..............................................................74 Table 3.11: Reprojection errors of projector calibrated using different markers (pixel)..........75 Table 5.1: Calibration result of experimental 3-d scanner (pixel)..........................................100. ix.

(11) List of symbols. Symbol. Meaning. Ow ,O c , O p. 3-d point represented in world frame, camera-centered, and projectorcentered space respectively. Ic,Ip. projective 2-d points in the camera image and on the projector screen respectively. x , y , z. the coordinates of a point in world frame. u , v , w. the coordinates of a point in camera-centered Euclidean space.  p ,q ,r. the coordinates of a point in projector-centered Euclidean space.  u , v . pixel coordinates on the ideal image plane of the camera.  p , q . pixel coordinates on the ideal screen of the projector. u , v . pixel coordinates on the radially distorted image plane of the camera.  p , q . pixel coordinates on the radially distorted screen of the projector.  u c , v c . principle point of image plane of the camera.  p c , q c . principle point of screen of the projector. fc, f. effective focal length of the camera and of the projector. p. sc , s p. horizontal pixel sampling factor of the camera and of the projector. 1c , 2c. first two radial distortion coefficients of the camera lens. 1p , 2p. first two radial distortion coefficients of the projector lens. Kc , K p. 3-by-3 matrices of intrinsic parameters of the camera and of the projector. T c =t u , t v , t w . translation from centre of world frame to projection centre of the camera. T p=t p , t q , t r . translation from centre of world frame to projection centre of the projector. Rc , R p. 3-by-3 rotation matrices that transform a 3-d point in world frame respectively to camera-centered and to projector-centered spaces. Ac =a u , a v , a w . 3-vector angle-axis parameterisation of the rotation matrix of camera. A p=a p , a q , a r . 3-vector angle-axis parameterisation of the rotation matrix of projector x.

(12) Pc , P p. 3-by-4 projection matrices that contains intrinsic and extrinsic parameters of the camera and of the projector respectively. = x ,  y , z . triangulation error vector. u ,  v. horizontal and vertical reprojection error of the camera.  p ,q. horizontal and vertical reprojection error of the projector. ∥∥. Euclidean norm of a vector. J u, J v. horizontal and vertical Jocobian matrices respect to parameters of the camera.. J p , Jq. horizontal and vertical Jocobian matrices respect to parameters of the projector.. Nc,N p. Normal matrices respect to parameters of the camera and of the projector respectively. xi.

(13) Chapter 1 : Introduction. 1. Chapter 1. Introduction. There is an increasing demand of 3-d applications in global industry. The growing development of activities such as filming, e-heritage preserving, intelligent vehicle, and multimedia applications all depend on 3-d computing technology. There exists many computer approaches to create digital 3-d contents. Three-dimensional reconstruction is an automatic or semi-automatic sequence of procedures that construct digital representations of real objects in shape, structure, and/or appearance. In last decade, the development of surface recovery techniques have been highly active in the fields of engineering, metrology and computer science. Computer vision-based 3-d reconstructions are perhaps the most popular branch among these techniques due to rapid advances in the technologies of vision computing. Vision-based approaches are mostly inspired by the depth perception of human's vision system. The understanding of depth is a complicated cognitive procedure, which involves analysis of shading, texture, spatial disparity, temporal silhouettes, motion parallax, and non-visual cues. In mimicking human's vision system, computer vision techniques have been designed to use information extracted from 2-d images. The acquisition of depths from images is still a challenging problem. There does not exist yet a total solution to meet all requirements of 3-d measuring tasks. Each shape recovery technique has its own advantages and drawbacks. For example, the binocular stereo method is capable to perform real-time distancing but inapplicable to measure featureless surface. This disadvantage can be complemented by the use of an active technique – the structured light method that projects light patterns onto surface to synthesize features. Nevertheless, scanning systems based on structured light also have some difficulties in 3-d measuring. The motivation of this thesis is to design, implement and improve an accurate 3-d digitizer based on structured light technique..

(14) Chapter 1 : Introduction. 2. This chapter is organized as follows. In the first section, some well-known shape recovery methods in computer vision are described. Section 2 introduces the state of the art of structured light technique as well as its difficulties in 3-d reconstruction. The motivation of our work, the expected results and our contributions are stated in section 3. The organization of this thesis is outlined in last section.. 1.1 Surface recovery techniques The construction of 3-d representatives of objects in digital world has become possible through a variety of shape recovery techniques. These disciplines solve ranging problem from different points of view, and with various constrains. In the taxonomy, the vision-based approaches are further categorized into passive vision and active vision. A passive vision approach derives range data from one or multiple images that are sampled under “nature” condition, whereas an active approach involves the emission of controlled radiation. A common source of radiation is visible light. The vision-based techniques are also characterized by their temporal and spatial configurations. For example, a classical binocular stereo vision system uses a pair of camera to capture depth image, while a similar approach uses one moving camera to achieve the same goal. Table 1.1 lists some related applications with respect to different camera configurations. In following subsections some well-known vision-based approaches are reviewed.. Single view. One sensor. Many sensors. Shape from shading, single view reconstruction. Binocular/trinocular stereo, range data fusion. Multiple views. Shape from contour (or silhouette), Dynamic stereo, space-time stereo, photometric stereo, shape from rotation, real-time distancing, the .enpeda. shape/structure from motion, optical flow project [1] Table 1.1: Different vision-based configurations and related techniques or applications 1.1.1. Shape from shading (SFS). The attempt to derive depth information from single photo can be traced back to the early.

(15) Chapter 1 : Introduction. 3. development in computer vision. The problem of recovering depth values from a gray-scale image was first discussed in Horn's work in early 70s [2], although the statement of problem is simple, finding a unique solution is difficult [3]. The shape from shading approach is itself an ill-posed problem, which means it can only be solved by further regularization. Some shading or geometrical assumptions have to be made to eliminate many possibilities. An interesting topic closely related to SFS is the 3-d reconstruction of painting (e.g. [4]).. 1.1.2. Photometric stereo (PSM). The photometric stereo method, which is also known as shape from multiple light sources, utilizes more than one light to overcome the ill-posed problem encountered by SFS. It was proposed by R.J. Woodham in 80's [5]. Assuming the surface obeys Lambertian reflectance model, its gradient field can be numerically recovered from three light sources (and from two sources with an ambiguity). By integration techniques (e.g. [6][7]) the solved discrete vector field is converted to a depth map with an unknown constant, which denote the absolute distance between centre of camera and measured surface.. 1.1.3. Shape from contours (SFC). Shape from contours (or silhouettes) recovers model of an object from its 2-d contours viewed from different directions [8]. It is perhaps the most straightforward method among passive vision approaches. A common implementation of SFC is to place the object on a computer-controlled turntable. For each viewing angle a frame is taken and further processed to extract the object's contours, which are then used by a carving algorithm to reconstruct volumetric model of the object. Although the accuracy of reconstruction may be limited by cavities on the object, SFC is an efficient method to acquire rough model for fast prototyping.. 1.1.4. Shape/structure from motion (SFM). The shape and structure of a rigid body are supposed to be consistent in the photos taken from arbitrary views [9]. It is possible to identify such invariant from 2-d images even if the.

(16) Chapter 1 : Introduction. 4. positional relationships between viewpoints are not known in advance. The reconstruction of scene from images of a moving camera is known as structure from motion. The term “motion” is defined relatively. It is equivalent to computing the structure of a moving object with the camera remaining still. Finding shape from motion is closely related to stereo vision systems. The simplest configuration of SFM is to place one camera in two different locations. In that case it becomes a temporal stereo pair with unknown extrinsic parameters. The SFM and binocular methods present similar mathematical problems, which have been extensively studied in terms of multiple view geometry. Common techniques involved in computing structure from motion include correspondence analysis (described later), optical flow analysis, camera calibration, triangulation, and bundle adjustment.. 1.1.5. Binocular stereo vision. Using a pair of cameras looking toward the same target to estimate distances is known as stereoscopic vision or binocular vision. Mimicking the human visual system, the depth information can be extracted from two conjugated images. The stereo vision approach is one of the most classical passive vision methods. The accuracy of a stereoscopic scanner relies on a good solution for the correspondence problem, which aims to find correspondences between pixels in two images. The correspondence analysis relates to extraction of view-invariant features and pattern matching techniques. The existence of features on analyzed surface are crucial for passive stereo vision systems.. 1.1.6. Structured light. By replacing one camera in a stereo vision system with a light projector, the configuration becomes an active stereo pair that solves correspondence problem in a straightforward way. Such a system is called a structured light system, and it is an active vision method that involves the use of light patterns. Two commonly adopted light sources are laser diodes and video projector [10]. Measurement based on structured light do not depend on features of surface nor on particular reflectance models. Due to the excellent robustness, the structured light method has been widely applied to numerous applications such as digital archiving [11],.

(17) Chapter 1 : Introduction. 5. biometric surveillance [12], reverse engineering [13], vision inspection [14], robot navigation and space science [15]. Many commercialized structured light scanners can be found in metrology industry (e.g. InSpeck 3D Mega Capturor [16] and HDI 3D Scanners [17]). More details on structured light will be given in subsection 1.2.. 1.1.7. Modulated light. Modulated light is a surface measuring method similar to structured light. In literatures it is occasionally considered a kind of structured light technique. The modulate light scanners have two major characteristics different from a typical structured light system, which are the use of rapidly modulated illumination and a high frame rate camera. The strength of projected light is modulated over short time according to a predefined function, which is a sinusoidal function in general. By analyzing the phase shift of reflected light, the round-trip time is estimated. A scanner based on modulated light is usually capable to achieve real-time ranging due to its fast operational rate (e.g. [18]).. 1.1.8. Laser rangefinder. Although depth recovery using laser rangefinders is strictly not a vision-based technique, it is frequently discussed in the comparison of 3-d reconstruction approaches. The simplest laser rangefinder uses the emission of single laser beam toward measured surface. The distance is calculated by analyzing the time-of-flight of the beam reflected back to a receptive sensor. The range data can be measured with an accuracy within one millimeter, however, at the cost of more expensive prices. A scanner using laser rangefinder can cost more than 30,000 US dollars. A laser rangefinder can be implemented in different form and sizes. A laser scanner can be designed as small as a handhold scanner for digitalizing a regular sized object, or as a sensor array that integrates multiple lasers to expand the scale of reconstruction and increase the scanning rate. For instance, Velodyne HDL-64E SE, a lidar (light radar), consists of 64 spinning laser rangefinders for real-time large field scanning. The solution provided by a laser rangefinder usually comes with no colour information. Therefore some computer vision techniques have to be integrated if texture data are desired..

(18) Chapter 1 : Introduction. 6. 1.2 Structured light and its difficulties There is a variety of proposed surface recovery techniques. However, only a few of them are successfully commercialized for practical applications, and the structured light technique is one such example. The textbook [8] gives structured light a definition that “the projection of light patterns into a scene is called structured lighting”. The origin of structured light comes from passive stereo vision system, which is equipped with a pair of cameras. By replacing one of two cameras by a light emitter, it forms an active stereo configuration. There is a number of proposed structured light systems in literature. These systems can be generalized by a model shown in fig. 1.1.. Fig. 1.1: Generalized model of structured light scanners..

(19) Chapter 1 : Introduction. 7. A main concern of structured light is the establishment of stereo correspondences. A pixel-wise correspondence correlates two pixels corresponding to the same 3-d spot in the scene. By exploiting a dense mapping of camera-projector correspondences, the scanned surface can be recovered by the so-called triangulation technique. The simplest way to discover point correspondences is by triggering pixels on projection screen one-by-one, as shown in fig. 1.2a. Obviously the procedure can be very time consuming. To accelerate the establishment of correspondences a column or a row of pixels are triggered simultaneously (see fig. 1.2b). This “line sweeping” technique yields 2D-to-1D mapping in a faster manner. To further accelerate the process, more complicated codification strategies have been proposed. Fig. 1.2c illustrates projection of a colour-coded pattern. The use of encoded patterns also introduces new problems. For example, a pixel may appear brighter due to inter-reflection of dense light projection, causing incorrect identification and decoding of correspondence. Besides, the resolution of a video projector is usually much lower than an image sensor. Such mismatch in resolution will lead to under-sampled stereo correspondences that results in undesired artifact on recovered surface. Calibrating a structured light scanner is another important issue. As shown in fig. 1.1, the accuracy of reconstruction is also dominated by the estimate of lens parameters. The calibra-. (a) (b) (c) Fig. 1.2: Different ways of establishing camera-projector stereo correspondences The mapping of pixel-wise correspondences can be discovered by (a) spot-by-spot scanning (b) line sweeping, or (c) pattern codification..

(20) Chapter 1 : Introduction. 8. tion of an image sensor is well-studied in computer vision, however, the calibration of a projector is sill an challenging problem. Conventionally the video projector is modeled as an inverse camera. To know what a projector “sees”, the correspondences have to be acquired. That way a camera calibration method can be applied to calibrate the projector once its view is reconstructed. The accuracy of such calibration is affected by established correspondences. Also, it may require many frames if one decide to calibrate the system from multiple view points.. 1.3 Motivation and expected results Our work is motivated by existing drawbacks of structured light. The goal is to build an improved and accurate active vision scanner. According to the layout shown in fig. 1.1, to increase overall performance of a structured light scanner one have to take into account two major procedures: 1. System calibration. The estimation of lens parameters of camera and projector directly impacts system accuracy. 2. Acquisition of stereo correspondence. The mapping is specific to the scene, which means it is also influenced by properties of the scanned surface. First, a precision calibration process is necessary to achieve accurate measurement. Knowing exact geometrical information of optical devices sets a solid ground for a structured light scanner. A tradeoff between accuracy and speed exists in conventional camera-projector calibration. Calibration from single viewpoint does not provide a robust estimate. To calibrate a system from multiple viewpoints, dense correspondences have to be established multiple times. It may require more than 100 frames to finish a 5-view calibration, and the result is not guaranteed to be fine due to the error introduced during the acquisition of correspondences. For an analytic reason, we argue that the dependency between system calibration and the strategy used to obtain dense stereo correspondences should be minimized. That way the calibration avoids errors caused by acquisition of correspondences. Obtaining pixel-wise correspondences also plays an important role. A modern structured light scanner implements pattern codification strategy to obtain dense stereo correspondences.

(21) Chapter 1 : Introduction. 9. within a few seconds. A sophisticated approach should be designed to accurately extract coded information through pattern sequences. To overcome insufficient resolution of a video projector, it also requires estimates of sub-stripe correspondences. Our work is expected to have following achievements. 1. An algorithm that performs accurate calibration of a camera-projector system from multiple viewpoints and requires much less frames. 2. The separation of the calibration from acquisition of dense correspondences. 3. A reliable pattern recognition technique to decode binary patterns from their projections into the scene. 4. Post-processing tasks that refine decoded correspondences to recover surfaces with fine details.. 1.4 Thesis organization This thesis is organized as follow. Section 2 states the technical details and background knowledge about structured light. In section 3 a novel calibration method for structured light is proposed to provide a solid foundation for accurate measurement. Based on the calibration the scanning of surface is further improved by approaches proposed in section 4. In section 5 some results of reconstruction are shown to evaluate our scanner. Section 6 concludes our work with a summary and possible extensions..

(22) Chapter 2 : Structured Light. Chapter 2. 10. Structured Light. This chapter covers principle, methodologies, and discussion of structured light technique. The beginning of this chapter gives the geometry and symbol definitions of a typical structured light system, allowing discussion of depth measuring algorithms in section 2.2. Section 2.3 covers existing camera-projector calibration methods for structured light. The acquisition of stereo correspondences, yet another important issue of structured light, will be discussed in section 2.4. Some variations on structured light are mentioned in section 2.5.. 2.1 Mathematical background The components of a novel structured light are a camera, a light projector, and a data processor. Concerning the illumination of light pattern, laser diodes and video projectors are two commonly adopted light sources. Due to popularity and cost-efficiency, in this work we will consider the use of video projector. The video projector is typically modeled as an inverse camera, making the configuration of structured light form a two-view geometry. The stereo setup of structured light allows the shape of surface to be measured using triangulation. The active stereo pair provides a straightforward way of solving correspondence problem. A geometrical discipline developed with regard to two-view vision system is known as epipolar geometry. It provides an useful and elegant mathematical framework to analyze geometrical relationship between projective spaces. However, it is based on theorems of projectivity that involve the use of homogeneous coordinates, which introduces one additional dimension. From another point of view, the structured light technique can be modeled using a relatively simple discipline – the analytical geometry (e.g. [1][2]). In this chapter we use different disciplines alternatively to discuss the stereo configuration of a structured light scanner..

(23) Chapter 2 : Structured Light. 11. Fig. 2.1: Geometry of a typical structured light system. Figure 2.4 shows the geometrical configuration of a typical structured light system. Note the right-handed coordinate system is chosen, and the negative w-axis of the camera and raxis of projector are pointing to the scanned scene. The notations in the figure will be explained and referred in following subsections.. 2.1.1. Transformation from image coordinates to 3-d space. First we consider one pixel in the image captured by a pinhole camera with focal length of f c pixels. Assuming a pixel  u i , vi  is expressed in image coordinates where the origin is on the upper left corner, it is transferred to camera-centered Euclidean space by.

(24) Chapter 2 : Structured Light. 12.    .  . 1 0 −u c u i ui u v =c v i =c 0 −1/ s c v c / sc vi , w −f c 0 0 −fc 1. (2.1). where  u c , v c  is the image centre or principle point where optical axis passes through image plane, s c is a scaling parameter representing heterogeneous pixel sampling rates in the horizontal and vertical direction and c denotes the metric scalar from pixel to unit of Euclidean space. The value of c is not important as the norm of a transferred image coordinates does not matter in projective space. The z-coordinate obtained from (2.1) is always negative, as the  ) to camera is assumed to point toward negative z direction. Note we use the hat notation (  distinguish the coordinates in image plane from three-dimensional space. Ignoring radial distortion and the skewness of image axes, the planar coordinates are ideally assumed undistorted. A similar transform is defined for a pixel  p j , q j  on projector screen.    . 1 0 − pc pj p = = q c /s p q qj p p 0 −1 / s p r −f p 0 0 −f p. with  p c , q c  the center of projector screen, f. p.  . p j q j , 1. (2.2). the focal length of lens of projector, s p the. aspect ratio of pixel and  p the transfer of metric for projector. The formulas defined in this subsection cast each pixel from planar image space to their device-centered Euclidean space.. 2.1.2. Projective space and homogeneous coordinates. A point lies in image plane is much meaningful under perspective projection. In particular, such point always corresponds to an unique line that contains itself and the centre of projection. Given a 3-d point c u i , v i ,− f c t lies in camera image plane, the path of projection can be expressed as.   . ui / f c u  i= v =m v i / f c , m0 , w −1. (2.3).

(25) Chapter 2 : Structured Light. 13. which is obviously a ray with a direction vector proportional to u i , v i ,− f c  . The set of all such rays begin at the origin of ℝ3 forms projective space P 2 . Note that the direction of c u i , v i ,− f c  is more significant than its norm. The metric scale c is canceled by divi-. sion. Since w=−m in (2.3), the equation can be simply rewritten as.   . ui/ f c u  i= v =−w vi / f c , w0 . w −1. (2.4). Multiplying both sides of (2.4) by diagonal matrix diag − f c ,− f c , 1 results in. .    . −fc 0 0 u ui 0 − f c 0 v =w v i . 0 0 1 w 1. (2.5). The terms on right-hand side of the equation above can be expressed in image coordinates as defined in (2.1), therefore it is equivalent to. .   . . −fc 0 0 u u i −u c 0 − f c 0 v =w − v i− v c /s c , 0 0 1 w 1. (2.6). which is then rewritten in a more clean form of matrix multiplication. .    . −fc 0 u c u u i u w v i = 0 =K s c f c v c v c v w w 1 0 0 1. (2.7). The 3-by-3 matrix K c is called camera matrix, calibration matrix, or perspective projection matrix in computer vision. It represents interior orientation, or intrinsic parameters of a pinhole camera. Equation (2.7) uses homogeneous coordinates to linearly represent the mapping between a point in 3-d space and an imaged pixel under central projection. Another common inhomogeneous form of the perspective projection can be obtained by expanding.

(26) Chapter 2 : Structured Light. 14. right-hand side of (2.7) and rearranging the result. It results in a pair of equations. u i=− f c. u  u c w. (2.8). v i =sc f c. v  v c . w. (2.9). and. Note the z-coordinate in derived formulas should be restricted to negative value according to the definition (2.4). The projection of projector is modeled in a similar way, which leads to. . −f p i r q i = 0 1 0 2.1.3. p. 0 sp f 0. p. u p q c 1.    . p p q =K p q . r r. (2.10). Coordinate system transformations. Since the position and orientation of camera, projector, and referred world coordinate system are different, some necessary transformations need to be introduced. The change of coordinate system can be done by a similarity transform. If the metrics of systems are consistent, the transform can be further specialized by an Euclidean transform, which is equivalent to a rigid body transform. The Euclidean transform in 3-d space is described by a 3by-3 rotation matrix R and a translational 3-vector T . It has the form. . . r 11 r 12 r 13 t x R T =  3×3 3×1  r 21 r 22 r 23 t y . r 31 r 32 r 33 t z. (2.11). Suppose there is a point  x , y , z  represented in world coordinates and the transform from world to camera coordinate system is determined by  Rc T c  , the projection given in (2.7) needs to be rewritten as.

(27) Chapter 2 : Structured Light. 15. . .  . −fc 0 u c r ux r uy r uz t u x u i w v i = 0 s c f c v c r vx r vy r vz t v y =K c  Rc T c  z 1 0 0 1 r wx r wy r wz t w 1. . x y , z 1. (2.12). or alternatively as paired equations. u i=− f c. r ux xr uy yr uz zt u u c , r wx xr wy yr wz z t w. (2.13). v i =sc f c. r vx xr vy y r vz zt v  v . r wx xr wy y r wz z t w c. (2.14). The matrix  Rc T c  is the extrinsic parameters of camera with respect to a referenced coordinate system, or the world frame. By analogy, let the extrinsic parameters of a projector be denoted by  R p T p , the central projection (2.10) becomes. . p i r q i =K p  R p T p  1. . x y . z 1. (2.15). The reconstruction of intrinsic and extrinsic parameters is known as camera calibration or camera resectioning. A more specific term geometrical calibration is sometimes used to differentiate the calibration of geometrical parameters from photometric calibration. The latter aims to estimate photometric properties such as exposure duration and response curve of an image sensor. The reconstruction of camera is a fundamental procedure for multiple view 3-d reconstruction techniques, and the structured light method is no exception. The calibration of structured light scanner will be discussed in section 2.3.. 2.1.4. Reprojection and backprojection. Two commonly referred operators in the discussion of perspective projection are reprojection and backprojection. Reprojection applies parameters of projection on a given set of 3-.

(28) Chapter 2 : Structured Light. 16. d points and yields a set of projected 2-d points. One of its usages is to evaluate the accuracy of calibration by calculating the residuals between observed and backprojected pixels. Backprojection, on the contrary, outputs a set of rays regarding to given parameters and a set of 2-d points lies in projection plane. The basic concept of triangulation is the backprojection of corresponding 2-d entities that lie in different planes. The error of backprojection can be used to estimate the accuracy of shape measurement.. 2.2 Surface recovery The structured light technique obtains depth by triangulation. Sometimes it is referred as active triangulation method to distinguish from passive stereo vision [3][4]. Depend on the degree of availability of the correspondences, triangulation is done in different manners. If only partial correspondences are available (i.e. the camera-projector correspondences are represented as a 2D-to-1D mapping), the triangulation algorithm finds intersection between a ray and a plane. If a complete 2D-to-2D mapping is available, a more accurate measure is done by locating the closest point of two rays.. 2.2.1. Triangulation case 1: partial correspondences. If the pixel-wise correspondences from camera image plane to projector screen are incomplete, that is, the mapping is either  u i , v i  pi or  u i , v i  q i , then the surface is measured by locating the intersection of the backprojected ray  i and the light plane  i . The triangulation performed in this manner is also as ray-plane intersection method, which is commonly adopted by line-sweeping type laser scanner. Note some literatures refer to the plane  i as projection plane, which should not be confused with projective plane in terms of central projection model. The ray starts at the origin of camera coordinate system and passes through pixel  u i , v i  is defined by (2.4). Without loss of generality, the mapped coordinate is assumed to be p i . It corresponds to a line p = pi on the projector screen. Through this line the backprojected plane  i is defined in projector-centered space. To geometrically model this plane, an angle.

(29) Chapter 2 : Structured Light. 17. Fig. 2.2: Illustration of a light plane decided by a line on projector screen. specific to p i is defined.  i=arctan.  i − p c p p =arctan i . −f p −f p. (2.16). Then the light plane is simply derived by rotating the Q-R plane of projector coordinate system by i around Q-axis, which yields. .  i=. cos i 0 sin i 0 1 0 −sin  i 0 cos i.    . r sin  i 0 = , q , r ∈ℝ . q q r r cos i. (2.17). Fig. 2.1 depicts the rotated Q-R plane to show how  i is obtained from  i . To find the intersection the  i and  i they must be represented in the same coordinate system. A good choice is to cast the ray to projector-centered space, due to simplicity in computation. Let the transform from camera to projector coordinate system be denoted by.  R T = R p Rc−1 T p−R p R c−1 T c  ,. (2.18). where  Rc T c  and  R p T p are respectively the extrinsic parameters of the camera and.

(30) Chapter 2 : Structured Light. 18. of the projector with respect to world coordinate system, applying the transform leads to a linear system.    . ui / f c r sin i =−w R v i / f c T , w , q , r ∈ℝ . q r cos  i −1. (2.19). Let r xi , r yi , r zi be the projection of u i / f c , v i / f c ,−1 onto three row vectors of R , the right-hand side of (2.19) can be rewritten as.   . −wr xit x r sin i = −wr yit y . q r cos i −wr zi t z. (2.20). By separating variables the matrix form of that linear system is acquired as. .    . r xi 0 sin i w tx r yi 1 0 q = ty . r zi 0 cos  i r tz. (2.21). If w , the distance from the centre of camera to the intersection, is the only unknown to solve, a “shortcut” solution arises from (2.20) by dividing the first row by the third row, given by. w=. t x −t z tan i . r xi −r zi tan  i. (2.22). The remaining unknowns are. q=t y −w r yi=. and.  r xi t y −r yi t x  r yi t z−r zi t y  tan  i r xi−r zi tan i. (2.23).

(31) Chapter 2 : Structured Light r=. 19 t z −w r zi r xi t z−r zi t x = . cos  i r xi cos i−r zi sin i. (2.24). Note under certain conditions the depth w can be calculated by a simplified formula which is a special case of the derived one. If the camera and projector coordinate systems have the identical XZ plane and overlapped X-axes, then R will become the identity matrix I 3 , and t y , t z will be zeros. This simplifies (2.22) to. w=. tx tx f c b f c tan i = = , u i / f c tan  i ui  f c tan i u i tan i f c. (2.25). where i=90− i and b=t x . This formula adopted by many simplified binocular stereo systems can also be derived using trigonometry, as shown in [1] The stability of the closed-form solution (2.22) is highly relevant to the placement of devices. When the backprojected ray and the light plane are nearly parallel, the solution of w becomes ill-behaved because of a very small denominator. Moreover, the modeling of nonlinear radial distortion becomes much more complicated for projected light planes, as the recovery of distortion bends  i . Note the 1-d correspondences may be upgraded to complete 2-d correspondences by enforcing epipolar constraints. In that case the triangulation algorithm solves the closest point problem for two lines.. 2.2.2. Triangulation case 2: complete 2-d correspondences. When the complete mapping  u i , v j   pi , q i  is available, the triangulation can be carried out in a more accurate way. In the ideal scenario, the incident and back-projected rays are supposed to meet at a location in 3-d space, although it rarely happens in practice due to errors. With complete correspondences the depth is estimated by locating the closest point between two rays. For an image pixel  u i , v j  its back-projected ray is defined by (2.4), and the corresponding incident ray of pixel  p i , q j  on projector screen is given by (2.10). The distance between these two rays can only be measured in the same space. To avoid introducing new.

(32) Chapter 2 : Structured Light. 20. symbols, the projector-centered space is chosen to perform triangulation. In that way the Euclidean distance is measured as the norm of following vector.      . ix r ix pi / f p tx i  w , r = iy =−w r iy r q i / f p  t y . iz r iz −1 tz. (2.26). Here we are seeking a negative vector w , r  that minimizes ∥i∥ . It is equivalent to finding the minimum of squared distance ∥i∥2 , which occurs in only one position if two rays are not parallel. By calculating the partial derivatives ∂∥ i∥2 =−2 ix ,iy , iz ⋅ r ix , r iy , r iz  ∂w. (2.27). ∂∥ i∥2 p q =2ix , iy , iz ⋅ i , i ,−1 , ∂r fp fp. (2.28). and. the w , r  that satisfies ∂∥ i∥2 ∂∥i∥2 = =0 ∂w ∂r. (2.29). is found to be the solution to a linear system. . 2. −Lci⋅L pi. Lci⋅L pi. 2. ∥Lci∥. −∥L pi∥.    . L ⋅T w = ci , r L pi⋅T. (2.30). where L ci=r ix , r iy , r iz  and L pi = p i / f p , qi / f p ,−1 . In particular, the depth measured from camera centre and projector centre are respectively given by.

(33) Chapter 2 : Structured Light. 21 2.  L ⋅L  L −∥L pi∥ L ci w= ci pi 2 pi ⋅T 2 2  Lci⋅L pi  −∥Lci∥ ∥L pi∥. (2.31). ∥Lci∥2 L pi − Lci⋅L pi  Lci ⋅T .  L ci⋅L pi 2−∥Lci∥2∥L pi∥2. (2.32). and r=. The 3-d coordinates can be estimated as the point closest to both L ci and L pi . This is known as middle-point algorithm, as the estimate is actually the mid-point on the shortest line connecting back-projected rays.. 2.2.3. Direct linear triangulation. The 3-d structure can equivalently be computed in homogeneous coordinates on a linear manner. Let P c and P p be the 4-by-3 projection matrices of camera and of projector respectively, a point Owi  x i , y i , z i , wi  in 3-d space along with its corresponding pixel I ci  u i , v i , 1 on camera image plane and I pi  p i , q i ,1 on projector screen is supposed to. satisfy I ci= Pc Owi and I pi =P p O wi simultaneously. That is, the equations. . (2.33). . (2.34). xi u i y v i =P c i zi 1 wi.  and. xi p i y qi =P p i zi 1 wi. . hold, ideally. Note the Owi has inhomogeneous coordinates  x i /w i , y i /w i , z i /wi  in 3-space. A homogeneous linear system of four unknowns can be constructed from (2.33) and (2.34). This system can be solved up to a scale. The result of linear triangulation is found to be the.

(34) Chapter 2 : Structured Light. 22. solution, and the arbitrary scale is canceled by division. This method is sometimes referred to as direct linear transformation, which is also a well-known camera calibration manner, as will be described later.. 2.2.4. Error estimate and optimal triangulation. The function defined by (2.26) can be used not only to solve the closet point of two lines but to calculate the geometrical error of triangulation. However, the error defined directly based on ∥i∥ is biased to the measurement, since the value of ∥i∥ is magnified as the range of triangulated 3-d spot goes far. The paper [5] suggested a normalized error function, which is defined as. i w , r =. i w , r  ci w pi r. (2.35). where  ci and  pi are defined in terms of reprojection error in camera image plane and projector screen respectively. From another point of view, the error defined by Euclidean distance between 3-d points is inappropriate for affine and projective reconstruction since the Euclidean properties such as distance and angle are not preserved under projective transformation. The error is suggested to be measured in projective planes [6]. Let I ci be a point lies on the image plane of camera, and I pi be its corresponding pixel on projector screen, the epipolar constraint guarantees T. I pi F I ci =0 ,. (2.36). where F is namely the fundamental matrix. The matrix invariant to the scene encapsulates geometrical relationship between two projective plane. An optimal triangulation is defined to minimize reprojection errors subject to the epipolar constrain..

(35) Chapter 2 : Structured Light. 23. 2.3 Calibration The task of calibration in a structured light system is to estimate a set of intrinsic and extrinsic parameters of the camera and the projector in order to perform depth measuring. Since the calibration of camera has been extensively studied in computer vision, this section will not give algorithm-specific descriptions to any well-known calibration method. Instead, we will focus on the calibration of projector. The projector is typically treated as an inverse camera. A common unified approach, as adopted in for example [7], applies a conventional camera calibration method to deal with projector. In general the calibration processes are separated for the camera and the projector, although there are few attempts in literature deal with both of them altogether. The introduction to three surveyed classical techniques for camera calibration is given in the first part of this subsection. Some selected works for calibrating a projector are discussed in the second part. The practical issues such as removal of radial distortion and non-linear minimisation will be discussed in chapter 3.. 2.3.1. Camera calibration. To calibrate a well-manufactured camera that has neither image distortion nor problematic lens alignment, four intrinsic and six extrinsic parameters are concerned. The intrinsic parameters are focal length f c , pixel aspect ratio s c and the principle point  u c , v c  which is expressed in image coordinates, while the extrinsic parameters consist of a rotation matrix Rc and a 3-element translation vector T c . Note the 3-by-3 rotation matrix is actually domin-. ated by three parameters instead of nine. The intrinsic and extrinsic parameters can be combined into a single 3-by-4 projection matrix P c , as.  . x u w v =K c  Rc T c  y =P c z 1 1. . x y . z 1. (2.37).

(36) Chapter 2 : Structured Light. 24. The overall strategy of camera calibration is to reference one or multiple images of an object with known geometry (e.g. a ball, a wall [8], or a pyramid [9]) or well-structured features (e.g. a checkerboard) to explicit the mapping from world to image coordinates. That is to use a set of 3-d points Oci  x i , y i , z i  and corresponding image coordinates I ci  u i , v i to estimate unknown parameters. The objects used for calibration are referred as calibration target, and the features on the targets are known as control points. In this work, the planar calibration targets are considered because they are easy to make and the detection of planar features are straightforward. Three well-known camera calibration strategies are Direct Linear Transform (DLT), Tsai's method, and Zhang's method [10]. The DLT method treats P c as a “black box” and. (a). (c) Fig. 2.3: Different calibration targets. (b). (d). (a) Red circular spot pattern, (b) Tsai's calibration pattern , (c) a checkerboard, and (d) projector view of the checkerboard reconstructed from stereo correspondences..

(37) Chapter 2 : Structured Light. 25. solves it directly. It applies Singular Value Decomposition (SVD) to seek a minimal-squares solution of a homogeneous linear system stacked from at least six correspondences. The linear system is generally over-determined to reduce the effect of noises. The solution is valid up to a scale, which means multiplying the solution by a non-zero scale factor also results in a valid solution. Therefore if one desire to know the exact value of P c the recovery of scaling factor needs to be performed, although it is not necessary for some applications. The intrinsic and extrinsic parameters can be further separated from P c by matrix decomposition technique. Remarkably DLT is perhaps the must commonly used method. However, DLT is unable to find all parameters from single image if the calibration points are coplanar. In the case of using planar calibration target at least two images are required (or three to determine the scale). The Tsai's proposal is the oldest method among all three. It is known as a two-step strategy since the parameters are estimated separately in two stages. For a non-planar or planar target the method first solves a part of Rc and T c up to a scale, which is then determined by the orthonormality of rotation matrix. The remaining parameters are then estimated from calibrated parameters. The raw result of Tsai's method is generally not so good because of the assumptuins on some elements in P c , and that is why a non-linear minimization procedure is recommended to optimize omitted elements as well as solved parameters. Comparing to two methods described, Zhang's method is more related to the nature of projective geometry. The foundation of Zhang's method is built on the image of absolute conics (IAC), which is an algebraic entity frequently discussed in computer vision applications related to projective geometry. The projective matrix can be decomposed into a chain of matrix multiplication. Zhang provides a close-form solution for each sub-matrices.. 2.3.2. Projector calibration. The geometrical calibration of projector has recently attracted a lot of attention in the field of computer vision [11][12][13]. A conventional calibration of a 2-d projector is done in two stages, as the projector is in fact not able to “see” the target. In the first stage dense correspondences from I c , pixels on camera image plane, to I p , pixels on projector screen are established. The mapping is then used to reconstruct the view of projector. In the second.

(38) Chapter 2 : Structured Light. 26. stage the parameters are estimated by applying the camera calibration procedure on the reconstructed view. The mapping can also be used directly to form a set of correspondences between O p the control points in world coordinates and I p the points on projector screen. The reprojection error of a calibrated projector is usually higher than a calibrated camera due to the accumulation of errors. The errors exist in acquired correspondences directly affect the accuracy of calibration. Moreover, the error caused by camera calibration is propagated non-linearly if the parameters of camera are referenced to calibrate the projector [14]. In the recent work [15] the projector was accurately calibrated using a planar target with circular features pasted on it. The board mounted on a high precision moving mechanism was shifted to several positions along z-axis of world frame during the calibration procedure. A combination of Gray-coded patterns and phase shifting fringes was adopted to obtain worldprojector correspondences for each position. It is remarkable that the mean reprojection error of calibrated projector was lower than 0.2 pixel. Since the calibration of camera and projector utilized same control points, the calibration error of camera did not affect the calibration of projection. The use of circular features, however, requires a more sophisticated extraction algorithm due to variant centroid of circle under perspective projection [16]. The work in [17] paid more attention to planar calibration targets. An easy way to make a planar target for calibration is to glue a print of pattern onto flat board. The authors argued. Work. BO09. SONG09. CUI08. GAO08. Projector type. LCD. DLP. DLP. LCD. Resolution. 1024/768. 1024/768. 1024/768. 1024/768. Establishment of Correspondences. Gray pattern plus phase shifting. Checkerboard pattern. Coded stripe pattern. Line pattern. Feature of control points. circular markers checkerboard on circular markers circular markers LCD display. Reported average reprojection error. 0.1133 px. 0.4001 px. 0.2243 px. Number of 9 8 5 referred views Fig. 2.4: Comparison of selected works for projector calibration. 0.4282 px 9.

(39) Chapter 2 : Structured Light. 27. that the planarity deviation of such object is significant high and can cause problem to precision ranging applications. They suggested to use a LCD panel to calibrate a structured light system. It reported that the reprojection error of a calibrated projector with a resolution of 1024 by 768 pixels had an average of around 0.5 pixel. The performance was ten times better than the use of “homemade” targets. Their previous work [18] shows a similar concept of using projection of checkerboard pattern onto a target that also has a print of checkerboard pattern attached on. The work [8], which is a part of an open-source project that contributes to a camera-projector calibration toolbox for MATLAB, demonstrated the use of a large plane with both pasted and projected checkerboard patterns onto it to do calibration. The printed pattern was used not only to calibrate camera but also to provide positional information of the board. Once the location and normal of the board are determined, the projector can be calibrated by a projected checkerboard. The error of calibrated camera is propagated as the result of utilizing camera's parameters to generate world-projector correspondences. The reprojection errors are not discussed in their work. In some systems the projector may be not able to generate point projection, therefore the reconstruction of projector's view is impossible. In that cases the projector can still be calibrated using projection-invariant properties, such as cross ratio and epipolar constrains. The work [19] is an instance that uses cross ratio property to calibrate system. The mean reprojection error of projector was around 0.42 pixel at the resolution of 1024 by 768 pixels.. 2.3.3. Self calibration and recalibration. It is possible to calibrate a structured light system, or more general a stereoscopic vision system, without explicit calibration procedure. Two subtopics related to such calibration manner are namely self calibration and self recalibration. The former focuses on calibrating two unknown devices, whereas the latter begins with previously estimated parameters [20]. In work [21] the photometric stereo method was used to alleviate the convergence problem encountered during the estimation of parameters of an uncalibrated structured light system. In another work [22] a trinocular structured light scanner with two cameras was designed to use vanishing lines computed from homography to self calibrate all components..