An Efficient Approach for Dynamic Calibration of Multiple Cameras

(1)

once per second which is acceptable for welding process control although further development of hardware and algorithm optimization would significantly improve the speed. Hence, the proposed algorithm appears to have the potential to be used in online control of metal transfer process.

V. CONCLUSION

• The bilinear interpolation is effective for image enhancement to-ward better edge detection.

• The proposed brightness-based selection and edge-based separa-tion algorithm can detect droplets from the image and try to detect adequate edge information from interpolated images.

• The proposed model for droplet edge gives an effective method to estimate the size of the droplet robustly and accurately and the proposed model validation assures that the model used meets a minimal accuracy requirement.

• The speed of the image processing appears to meet the minimal requirement for real-time control.

It should be mentioned that certain algorithm parameters and equa-tions are ad hoc for the well-defined and constrained problem under investigation. In case welding conditions change, appropriate modifi-cations may become necessary.

REFERENCES

[1] Y. M. Zhang and P. J. Li, “Modified active control of metal transfer and pulsed GMAW of titanium,” Welding J., vol. 80, pp. 54S–61S, 2001. [2] L. A. Jones, T. W. Eagar, and J. H. Lang, “A dynamic model of drops

detaching from a gas metal arc welding electrode,” J. Phys. D-Appl. Phys., vol. 31, pp. 107–123, 1998.

[3] G. Wang, P. G. Huang, and Y. M. Zhang, “Numerical analysis of metal transfer in gas metal arc welding under modified pulsed current condi-tions,” Metallurgical and Materials Trans. B-Process Metallurgy and Materials Processing Science, vol. 35, pp. 857–866, 2004.

[4] S. Chakraborty, “Analytical investigations on breakup of viscous liquid droplets on surface tension modulation during welding metal transfer,” Appl. Phys. Lett., vol. 86, pp. 1–3, 2005.

[5] C. S. Wu, M. A. Chen, and Y. F. Lu, “Effect of current waveforms on metal transfer in pulsed gas metal arc welding,” Measure. Sci. Technol., vol. 16, pp. 2459–2465, 2005.

[6] B. Y. B. Yudodibroto, M. J. M. Hermans, and Y. Hirata et al., “Pendant droplet oscillation during GMAW,” Sci. Technol. Welding, vol. 11, pp. 308–314, 2006.

[7] S. Rhee and A. Kannatey, “Observation of metal transfer during gas metal arc-welding,” Welding J., vol. 71, pp. 381S–386S, 1992. [8] K. H. Li, J. S. Chen, and Y. M. Zhang, “Double- electrode GMAW

process and control,” Welding J., vol. 86, no. 8, pp. 231S–237S, 2007. [9] Z. Wang, K. R. Rao, and J. Ben-Arie, “Optimal ramp edge detection using expansion matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 18, no. 11, pp. 1092–1097, 1996.

[10] N. Saito and M. A. Cunningham, “Generalized e-filter and its applica-tion to edge detecapplica-tion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 8, pp. 814–817, 1990.

[11] R. J. Qian and T. S. Huang, “Optimal edge detection in two-di-mensional images,” IEEE Trans. Image Process., vol. 5, no. 7, pp. 1215–1220, 1996.

[12] M. Petrou and J. Kittler, “Optimal edge detector for ramp edges,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 13, no. 5, pp. 483–491, 1991. [13] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed.

Upper Saddle River, NJ: Prentice-Hall.

[14] F. Luo, S. J. Wu, L. C. Jiao, and L. R. Zhang, “Implementation of de-noise DWT chip based on adaptive soft-threshold,” in Proc. Int. Conf., Signal Process., , 2000, vol. 1, pp. 614–618.

[15] M. Tommiska, M. Loukola, and T. Koskivirta, “An FPGA-based simu-lation and implementation of AAL type 2 receiver,” J. Commun. Netw., pp. 63–67, 1999.

An Efficient Approach for Dynamic Calibration of Multiple Cameras

I.-Hsien Chen and Sheng-Jyh Wang

Abstract—In this paper, we propose a new algorithm for dynamic cali-bration of multiple cameras. Based on the mapping between a horizontal plane in the 3-D space and the 2-D image plane on a panned and tilted camera, we utilize the displacement of feature points and the epipolar-plane constraint among multiple cameras to infer the changes of pan and tilt an-gles for each camera. This algorithm does not require a complicated cor-respondence of feature points. It can be applied to surveillance systems with wide-range coverage. It also allows the presence of moving objects in the captured scenes while performing dynamic calibration. The sensitivity analysis of our algorithm with respect to measurement errors and fluctu-ations in previous estimfluctu-ations is also discussed. The efficiency and feasi-bility of this approach has been demonstrated in some experiments over real scenery.

Note to Practitioners—For a surveillance system with multiple cameras, the poses of cameras may be changed from time to time to acquire different views of the monitored scene. Whenever the poses of cameras are changed, the relative positioning and orientation among cameras need to be recali-brated. In this paper, we demonstrate a new and efficient approach to cal-ibrate multiple cameras dynamically. The concept of our approach origi-nated from the observation that people usually can identify the directions of the pan and tilt angles, and even make a rough estimate about the changes of pan and tilt angles, simply based on some clues revealed in the captured images.

In our approach, a set of cameras are first calibrated based on a static calibration method. As cameras begin to pan or tilt, the images of these cameras change accordingly. We keep extracting and tracking a few fea-ture points from these images. Based on the displacement of these feafea-ture points in consecutive images, the pan and tilt angle changes of the cam-eras can be automatically estimated via the proposed approach. There is no need to place calibration patterns or landmarks while performing dy-namic calibration. In addition, there is no need to perform the complicated correspondence of feature points among cameras. The proposed approach is practical for a wide-range surveillance system with multiple cameras and is applicable for complicated environments. In the future, we will combine the proposed approach with an object tracking system to develop an effi-cient active surveillance system with multiple cameras.

Index Terms—Dynamic camera calibration, multiple cameras.

I. INTRODUCTION

For a surveillance system with multiple cameras, cameras may pan or tilt from time to time to acquire different views of the monitored scene. However, when a camera pans or tilts, its extrinsic parameters change accordingly. For this type of surveillance systems, how to accurately and efficiently recalibrate the extrinsic parameters of multiple cameras has become an important issue.

Up to now, various kinds of approaches have been developed to cal-ibrate static camera’s intrinsic and extrinsic parameters, such as the

Manuscript received April 23, 2007; revised April 23, 2007. First published June 10, 2008; current version published December 30, 2008. This paper was recommended for publication by Associate Editor Y. F. Li and Editor M. Wang upon evaluation of the reviewers’ comments. This work was supported in part by the Ministry of Economic Affairs of the Republic of China under Grant 96-EC-17-A-02-S1-032.

The authors are with the Department of Electronics Engineering, Institute of Electronics, National Chiao Tung University, Hsinchu 300, Taiwan, R.O.C. (e-mail: [email protected]; [email protected]).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

(2)

was estimated from a planar target. However, both [10] and [11] only demonstrated the dynamic calibration of a single camera, but not the calibration among multiple cameras. In [12], the authors utilized the marks and width of parallel lanes to calibrate PTZ cameras. Although this method is practical for traffic monitoring, it is not general enough for other types of surveillance systems. In [13], a dynamic camera cal-ibration with narrow-range coverage was proposed. For a pair of cam-eras, this method performed the correspondence of feature points on the image pair and used coplanar geometry for camera calibration. In [14], the relative pose between a calibrated camera and a projector was determined via plane-based homography. This approach requires the correspondence of feature points. However, for surveillance systems with wide-range coverage, the matching of feature points is usually a difficult problem.

In this paper, we propose a new algorithm for dynamic calibration of multiple cameras. This algorithm does not require a complicated cor-respondence of feature points. Our algorithm also allows the presence of moving objects in the captured scenes, while performing dynamic calibration. As cameras begin to pan or tilt, we keep extracting and tracking feature points based on the Kanade–Lucas–Tomasi (KLT) al-gorithm [15]. Next, we utilize the displacement of feature points and the epipolar-plane constraint among multiple cameras to infer the changes of pan and tilt angles for each camera. Compared with [13], we only need the correspondence of epipolar lines but not the exact matching of feature points. The use of epipolar lines greatly simplifies the cor-respondence process and makes our approach suitable for complicated surveillance environments.

This paper is organized as follows. First, in Section II, we explain how we utilize the displacement of feature points and the epipolar-plane constraint to infer the changes of pan angle and tilt angle. We also describe how to filter out undesired feature points when moving ob-jects are present. The sensitivity analysis with respect to measurement errors and the fluctuations of previous estimations will be addressed in Section III. Some experimental results over real data are demonstrated in Section IV. Finally, in Section V, the conclusion is drawn.

II. DYNAMICCALIBRATION OFMULTIPLECAMERAS In this section, we explain how we perform dynamic calibration process based on temporal and 3-D spatial information. We will first in-troduce how to calibrate a dynamic camera based on the displacement of feature points in the temporal domain. After that, we will apply the epipolar-plane constraint over each pair of cameras to obtain more ro-bust calibration. Moreover, since people or moving objects may enter or leave the scene while cameras are capturing images, we need to filter out their interferences to avoid the degradation of calibration accuracy. In Fig. 1, we show an overall picture of the proposed dynamic calibra-tion algorithm.

A. Dynamic Calibration of a Single Camera

The basic camera model and related formulae can be found in our previous work [17]. When a camera with a rotation radiusr has a tilt angle and a pan angle , we may deduce (1), to express the back projection functionB(p; ; ; h; ) from the image coordinates p =

Fig. 1. Flowchart of the proposed dynamic calibration algorithm.

(x; y) onto a 3-D point (X; Y; Z) lying on a horizontal plane with Y = 0h [17] X Y Z = xC (rS 0h)+yS (r0hS )0hC S (yC 0S ) 0h xS (h0rS )+yC (r0hS )0hC C (yC 0S ) 0 r B(p; ; ; h; ): (1)

Here,C; S; C, andSrepresentcos(); sin(); cos() and sin(), respectively. Additionally, represents the set of intrinsic parameters of the camera.

Assume we have a set of cameras. At the beginning, we calibrate the 3-D pose of each camera via the static calibration method proposed in [17]. As a camera starts to pan or tilt, its image content changes. To recalibrate the new pose of the camera, we check the temporal displace-ment of a few feature points in the image. Here, we use the KLT method [15] to extract and track feature points in consecutive images. We also assume all extracted feature points correspond to some unknown static points in the 3-D space.

Typically, we may assume the rotation radiusr is far smaller than the distances between these 3-D points and the camera. We also assume the changes of pan angle and tilt angle are very small during the capturing of two successive images. With these two assumptions, the projection center of the camera can be thought to be fixed with respect to the 3-D points, while the camera is panning or tilting. In other words, the pro-jection lines, which connect the propro-jection center to each of these ob-served 3-D points, are fixed in the 3-D space, as long as these 3-D points stay static during the capture of images. By using these projection lines as a reference, we may recalibrate the new pose of the camera. More-over, as illustrated in Fig. 2, if three 3-D points,PA; PB, andPC, are replaced by another three points, ^PA; ^PB, and ^PCon their projection lines, there is no influence on the projected points on the image plane. Hence, even if we do not actually know the real locations of these 3-D points, we may simply back project all feature points in the 2-D image onto a 3-D pseudo plane with a constantZ coordinate, as shown in Fig. 2.

In our approach, based on a few feature points on a pair of successive imagesIt01andIt, we first back project these feature points inIt01

onto a 3-D pseudo plane with a constantZ. Then, we try to find a new pose of the camera that can map the corresponding feature points inIt

onto the same 3-D pseudo points. That is, if we assume the camera has the pan anglet01and the tilt anglet01, while capturingIt01, and has the pan anglet01+ 1tand the tilt anglet01+ 1t, while

(3)

Fig. 2. Illustration of a pseudoplane5’.

capturingIt, we try to find1tand1tthat minimize the following formula: D = K k=1 k ^B(^pk; t01+ 1t; t01+ 1t) 0 ^B(pk; t01; t01)k2: (2)

In (2), ^B represents the back projection function of an image feature point onto a pseudo 3-D plane50. Here, we especially use “hat” to denote that the back-projection is restricted to a vertical pseudo plane 50_{. Besides,}_p

k denotes a feature point inIt01 and ^pk denotes the same feature point inIt.K is the total number of image feature points for calibration. Note that in (2), we ignore the altitude parameterh of these back-projected points. This is because the altitudeh can be deduced from (1) once if theZ coordinate is fixed. We also ignore the intrinsic parameters since they are not changed when the camera pans and tilts.

B. Dynamic Calibration of Multiple Cameras Based on Epipolar-Plane Constraint

In the previous section, we assume the projection center of a single camera is fixed during panning and tilting. The projection lines are then used as a reference to calibrate the new pose of that camera. To further increase the accuracy of calibration, we add on the 3-D spatial relationship among cameras.

In Fig. 3, we show the epipolar geometry for a pair of cameras [16]. For these two cameras, their projection centers,O_C1andO_C2, together with a 3-D pointP_A, determine an epipolar plane5. This epipolar plane5 intersects the image planes of the cameras to form two epipolar linesl₁ andl₂. Ifp_A1 andp2_A are the projected points ofP_A on the image planes, they must lie onl₁andl₂, respectively. This epipolar constraint implies thatO_C1; O_C2; p1_A, andp2_A are coplanar and the epipolar plane5 can be expressed as

(OC1; OC2; p1A; 1; 1)

Oc2Oc12 Oc1B(p1A; 1; 1)

or (OC1; OC2; p2A; 2; 2)

Oc1Oc22 Oc2B(p2A; 2; 2): (3)

In (3), we use theB(:) function defined in (1). Note that we ignore the altitude parameterh because the formation of epipolar plane is actually independent ofh. That is, no matter what value h is, the epipolar plane is still the same.

As illustrated in Fig. 3, we assume a pair of cameras has initially been calibrated via some kind of calibration algorithm. We assume a few features, likep1_A; p1_B; p1_C; p2_A; p2_D, andp2_E, are located on a pair

Fig. 3. Illustration of epipolar-plane constraint.

of corresponding epipolar lines. Without performing pointwise corre-spondence, we do not actually know where these feature points are pro-jected from. However, we are still confident of the fact that these 3-D points must be “somewhere” on the epipolar plane. As long as these 3-D points remain static in the 3-D space, this epipolar plane is fixed. Hence, the epipolar planes that have been identified at the previous mo-ment can be used as a reference for the calibration of cameras at the current moment.

In Fig. 3, we assume a pair of cameras has been calibrated at the time instantt01 and an epipolar pane 5 has been identified. Assume at that time instantt01, the pan and tilt angles of Camera-1 are _t011 and1_t01, while the pan and tilt angles of Camera-2 are2_t01and2_t01. Camera-1 captures the imageI_t011 , while Camera-2 capturesI_t012 . On the other hand, at the time instantt, Camera-1 rotates to a new pan angle (_t011 + 11

t) and a new tilt angle (1t01+ 11t), while Camera-2 rotates to

(2

t01+1t2) and (2t01+12t). Here, we only discuss the calibration

of Camera-1. The calibration of Camera-2 can be implemented in a similar way.

For Camera-1, assume a prominent feature pointp1_Ahas been ex-tracted fromI_t011 . This feature moves to^p1_AinI_t1. Based onp1_A; 1_t01, and1_t01, we may form an epipolar plane5. At the time instant t, we then seek to find the angles(_t011 + 11_t) and (1_t01+ 11_t) such that ^p1

Ais still located on the same epipolar plane. That is, we seek to find

11

t and11t such that

B ^p1

A; 1t01+ 11t; 1t01+ 11t

1 OC1; OC2; pA1; t011 ; 1t01 = 0: (4)

Similarly, forp1_B andp1_C that share the same epipolar line withp1_A, we have B ^p1 B; t011 + 1t1; 1t01+ 11t 1 OC1; OC2; p1A; t011 ; 1t01 and B ^p1 C; t011 + 11t; 1t01+ 11t 1 OC1; OC2; p1A; t011 ; 1t01 : (5)

Note that in (4) and (5), the projection centerOC2may have a slight movement when Camera-2 rotates. That movement can be taken into account to achieve more accurate calibration. Here, we simply ignore that part to simplify the formulation.

For Camera-1, assume we have extractedm epipolar lines. More-over, on thejth epipolar line, where j = 1; 2; . . . ; m, we have ex-tractednj feature pointsfp1_j;1; p_j;21 ; . . . ; p1_j;n g on I_t011 . Thesenj

feature points move tof^p1_j;1; ^p1_j;2; . . . ; ^p1_j;n g on I_t1. Besides, we as-sumep1_j denotes one of the feature points infp1_j;1; p1_j;2; . . . ; p1_j;n g.

(4)

F1 t = j=1i=1 ^ B ^p1 j;i; t011 + 1t1; 1t01+ 11t 0 ^B p1j;i; 1t01; 1t01 2 + m j=1 n i=1 B ^p1 j;i; t011 + 11t; 1t01+ 11t 1 OC1; OC2; p1j; 1t01; 1t01 2: (7)

Similarly, the changes of pan angle and tilt angle of Camera-2 can be estimated by minimizing F2 t = m j=1 n i=1 ^ B ^p2 j;i; t012 + 1t2; 2t01+ 12t 0 ^B p2 j;i; t012 ; 2t01 2 + m j=1 n i=1 B ^p2 j;i; t012 + 12t; 2t01+ 12t 1 OC1; OC2; p2j; 2t01; 2t01 2: (8)

Here, is a parameter to weight the contributions of temporal clues and 3-D spatial clues. In our experiments, we simply set = 1. In theory, for each camera, one feature point is sufficient for the first right term of (7) or (8) to solve1tand1t. Whenever a pair of epipolar lines can be determined, any feature point on the epipolar lines can be used for the second right term of (7) or (8) to make the estimation more robust. To deduce11_t; 11_t;12_t, and12_t;, we adopt the Levenberg–Mar-quardt (LM) algorithm. In our experiments, the initial guesses of pan/ tilt angle changes are set to be 0. Note that for a pair of corresponding epipolar lines, Camera-1 and Camera-2 may have very different num-bers of feature points. That is, thenj in (7) may be different from the njin (8). This is because we do not actually seek to perform the corre-spondence of feature points. Instead, we seek for a consistent matching of epipolar lines betweenIt01andIt. This strategy greatly simplifies the correspondence problem. Moreover, formulae (7) and (8) can also be merged together into a single formula in the optimization process.

In summary, for the proposed dynamic calibration algorithm, we per-form the following steps.

Step 1) We perform static camera calibration based on the method proposed in [17]. After that, cameras are allowed to pan and tilt freely.

Step 2) On each image, a few feature points are extracted and tracked based on the KLT algorithm [15]. Feature points moving out of the image are removed, while new feature points entering the image are added.

Step 3) For each pair of cameras, based on the previous calibration results, we generate pairs of epipolar lines that pass through these extracted feature points. Actually, as long as a fea-ture point is within a predefined distance from an epipolar line, we say that the feature point is passed through by the epipolar line. In our experiments, the predefined distance is set to be 3 pixels.

Fig. 4. Image pairs captured at two different time instants. Green lines indicate three pairs of corresponding epipolar lines.

Fig. 5. (a) Image captured by a camera with 55.1 tilt angle. (b) Image captured by a camera with 54.6 tilt angle. Red crosses represent feature points extracted by the KLT algorithm.

Step 4) Based on the extracted feature points and the information of epipolar lines, we calibrate the new pan angle and tilt angle for each pair of cameras by minimizing (7) and (8). After that, go back to Step 2.

The above procedure is repeated to acquire the new poses of all cameras. In Fig. 4(a) and (b), we show images captured by two dif-ferent cameras at two difdif-ferent time instants, overlapped by three pairs of epipolar lines. Note that even though the feature points on these epipolar lines come from different 3-D points, we may still be able to achieve reliable dynamic calibration based on the matching of epipolar lines.

C. Dynamic Calibration With Presence Of Moving Objects

So far, we have assumed all the feature points used for calibration correspond to some fixed 3-D points in the scene. However, in real applications, such as object tracking or 3-D positioning, some moving objects may be present. To guarantee accurate calibration, we need to get rid of these feature points related to moving objects.

In Fig. 5, we show two successive image frames where the camera tilts up by 0.5. For each feature point, we calculate its spatial displace-ment (dx; dy). The distribution of (dx; dy) is plotted in Fig. 6, where most displacements cluster around(0; 04). These clustered displace-ments correspond to the movedisplace-ments of static feature points caused by camera rotation. On the other hand, there exist some outlier displace-ments which correspond to the movement of feature points lying on the moving person.

However, the displacement of feature points depends not only on the pose of camera but also on the contents inside the 3-D scene.

(5)

Theoret-Fig. 6. The distribution of spatial displacement for the extracted feature points in Fig. 5.

Fig. 7. The x-component displacement of feature points with respect to the changes of pan angle for four different cameras, without the presence of moving objects. The relationships for Camera-1, Camera-2, Camera-3, and Camera-4 are plotted in red, blue, green, and magenta, respectively.

ically, by taking the partial derivative of the image coordinates with respect to the pan angle, we have

@x @ = 0 C(XC0 ZS0 rS)2 (XCS0 Y S+ ZCC+ r(CC0 1))2 0 (XS+ ZC+ rC) XCS0 Y S+ ZCC+ r(CC0 1) @y @ = 0(XCC0 ZCS0 rCS) 1_(XC(XSS+ Y C+ ZSC+ rSC) S0 Y S+ ZCC+ r(CC0 1))2 +_XC (XSC0 ZSS0 rSS) S0 Y S+ ZCC+ r(CC0 1) (9)

which indicates how the location of a feature point varies with respect to the change of pan angle. To simplify the formula, we assume = 0 to ignore the influence of tilt angle. The simplified formula is expressed in (10), shown at the bottom of the page. Similarly, by ignoring the effect of pan angle, (11), shown at the bottom of the page, indicates how

Fig. 8. (a) Standard deviation ofdx with respect to the median of dx when cameras are under panning. (b) Standard deviation ofdy with respect to the median ofdx when cameras are under panning.

the location of a feature point varies with respect to the change of tilt angle. Both (10) and (11) indicate the crucial role of the 3-D location (X; Y; Z) in the displacement of feature points. Hence, for different scenes, we expect different degrees of feature point displacements. On the other hand, if the same scene observed by the same camera but with two different pan-angle changes, not only the displacement magnitudes but also the distributions of displacement are different. The distribution with a smaller pan angle change is more compact.

Since the distribution of the displacement highly depends on the ob-served scene and the magnitude of angle change, we obtain the charac-teristics of displacement via a learning process for each camera. In the learning stage, we intentionally pan and tilt each camera to capture a se-quence of images, without the presence of moving objects. In our exper-iments, four cameras are used and Fig. 9 shows an example of images captured by these four cameras. In Fig. 7, we show thex-component displacement of feature points with respect to the change of pan angle for each of our four cameras. It can be observed that Camera-1 and Camera-3 have roughly the same statistical behaviors, while Camera-2 and Camera-4 have similar behaviors. In Fig. 8(a), we further plot the relationship between the standard deviation ofdx and the median of dx when cameras are under panning. Again, Camera-1 and Camera-3 have roughly the same statistical behaviors, while Camera-2 and Camera-4 have similar behaviors. Even though different cameras may have very different statistical behaviors, the relationship between the standard de-viations ofdx and the median of dx is roughly linear for each camera. Similarly, Fig. 8(b) shows the statistical relationship between the stan-dard deviation ofdy and the median of dx. On the other hand, for the tilting case, we also observed similar statistical behaviors between the standard deviation ofdx (or dy) and the median of dy. All these sta-tistical relationships offer useful knowledge about the displacement of feature points when the 3-D scene is stationary.

When moving objects are present, these feature points caused by the moving objects usually have very different statistical behaviors. Hence, in the dynamic calibration process, we may calculate the median of dis-placements for all feature points. Based on the median, we estimate the standard deviation of displacement according to these already learned statistical relationships. When the displacement of a feature point is away from the median by three standard deviations, that feature point

@x @ @y @ = 0 (XC 0ZS 0rS ) (XS +ZC +r(C 01)) 0 1 +XS +ZC +r(C 01)r Y (XC 0ZS 0rS ) (XS +ZC +r(C 01)) (10) @x @ @y @ = X (Y C +ZS +rS ) (0Y S +ZC +r(C 01)) _{(0Y S +ZC +r(C 01))}(Y C +ZS +rS ) + 1 + r 0Y S +ZC +r(C 01) (11)

(6)

tracking errors and the departure of feature points from the epipolar lines.

Without loss of generality, we only discuss the sensitivity of our al-gorithm in the dynamic calibration of Camera-1. In theory, for the es-timation of11_t and11_t, the optimization of (7) conforms tof₁ @F1

t=@(11t) = 0 and f2 @Ft1=@(11t) = 0. Note that in (7), the

projection centerO_C2actually has a slight movement when Camera-2 rotates. This is because the rotation center is not exactly the same as the projection center. To simplify the formulation of (7), we intention-ally ignored that part in Section II. However, in the implementation of our algorithm, we actually had taken this fact into account to achieve more accurate calibration. Hence, in the following analyses,f₁andf₂ depend not only on1_t01 and1_t01, but also on2_t01 and2_t01. On the other hand,f₁ andf₂ also depend on the measurement errors of f^pk

j;1; ^pkj;2; . . . ; ^pkj;n g, where j = 1; 2; . . . ; m and k = 1 or 2. Here,

m denotes the number of epipolar-lines used for dynamic calibration. To find how11_t and11_t deviate with respect to the fluctuations of k

t01, wherek = 1 or 2, we may apply the implicit function theorem

overf₁andf₂to get

@(1 ) @( ) @(1 ) @( ) = 0 @f @(1 ) @(1 )@f @f @(1 ) @(1 )@f 01 @f @( ) @f @( ) (12)

Similarly, we can deduce the formulae for @(1_t1)= @(k

t01); @(11t)=@(t01k ); @(1t1)=@(^p1j;i), and @(1t1)=@(^pj;i1 ).

If we assume the total variations of11_t and11_t are the combina-tion of individual variacombina-tion with respect to the fluctuacombina-tions ink_t01and k

t01and the measurement errors inf^pj;11 ; ^p1j;2; . . . ; ^pj;n1 g, we have

(11 t) 2 k=1 @(11 t) @(k t01)( k t01) + @(1 1 t) @(k t01)( k t01) + m j=1 n i=1 @(11 t) @(^p1 j;i)(^p 1 j;i) (13) and (11 t) 2 k=1 @(11 t) @(k t01)( k t01) + @(1 1 t) @(k t01)( k t01) + m j=1 n i=1 @(11 t) @(^p1 j;i) (^p 1 j;i) : (14)

To verify the above formulae, we perform the following simulations. Here, two cameras are assumed to have been accurately calibrated. Camera-1 is hung at a height 2.06 m. If Camera-1 is translated by 00:69; 00:13, and 6.25 m along X, Y , and Z axes, respectively, and then rotated by0143:64aboutY axis, Camera-1 will coincide with Camera-2. At first, Camera-1 has the pan angle1₀ = 0 and tilt angle 1

0 = 20, while Camera-2 has the pan angle 02 = 0 and tilt angle

2

0 = 40. Moreover, based on the rectified world coordinate system

of Camera-1, we assume there is an epipolar plane5 with the homo-geneous coordinates = [0:63; 0:77; 0:09; 0:01]. Based on this plane 5, we deduce the corresponding epipolar lines on the image planes of

TABLE II

VARIATIONS OFESTIMATIONRESULTSWITHRESPECT TODISTANCE

FLUCTUATIONS INEPIPOLARLINES

these two cameras. On each of these two epipolar lines, we randomly choose three image pointsfpk_1;1; pk_1;2; pk_1;3g as the feature points, with k = 1 or 2. After that, the tilt angle of Camera-1 is changed to 20.5

so that the feature points on the image plane of Camera-1 will move to the new positionsf^p1_1;1; ^p1_1;2; ^p1_1;3g. Besides, the intrinsic parameters f1; 1; 2; 2g are set to be f392; 388; 392:3; 385g.

In the simulation, we change individually the initial pan and tilt an-glesf1₀; 1₀; 2₀; 2₀g of Camera-1 and Camera-2 to see how the es-timated values of1₁1and11₁vary. Moreover, we also change the measurement^p1_1;1whose coordinates are defined as(^x1_1;1; ^y1_1;1) to see how1₁1and11₁vary. Here, the LM algorithm is applied to (7) for the estimation of1₁1and11₁. The variations of these estimation re-sults, together with the variations deduced by (13) and (14) are listed in Table I. Besides, we also show in Table II how1₁1and11₁vary with respect to the distance fluctuationd in epipolar lines. In our sim-ulation, we change the measurement^p1_1;1to be away from its epipolar line. The deduced variations can be expressed as

(11 t) @(1 1 t) @(^x1 1;1) @(^x1 1;1) @(d) (d) + @(1 1 t) @(^y1 1;1) @(^y1 1;1) @(d) (d) (15) and (11 t) @(1 1 t) @(^x1 1;1) @(^x1 1;1) @(d) (d) + @(1 1 t) @(^y1 1;1) @(^y1 1;1) @(d) (d): (16) It can be seen that the all deduced variations in Tables I and II well approximate the simulation results.

Additionally, when the number of epipolar line pair doubles, the er-rors of estimated11₁and11₁caused by the fluctuations of the feature points are roughly halved. On the other hand, the errors of estimated 11

1 and111caused by the fluctuations off01; 10; 20; 20g have no

(7)

Fig. 9. Test images with the presence of landmarks. The images captured by Camera-1, Camera-2, Camera-3, and Camera-4 are arranged in the left-to-right, top-to-bottom order.

the variations of estimated1₁1and11₁caused by the fluctuations of f1

0; 10; 02; 20g and f^p11;1; ^p11;2; ^p11;3g still confirm to that in Tables I

and II. Finally, we also change the tilt angle1₀from 20to 80with a 20step, and repeat the simulation. The variations of the simulation results also confirm to that in Tables I and II. In practice, the initial static calibration is usually accurate enough so that the fluctuations of f1

0; 10; 02; 20g are usually less than 0.5. Moreover, the measurement

errors off^p1_1;1; ^p1_1;2; ^p1_1;3g are likely to be less than 2 pixels. Hence, the estimation errors of11₁and11₁are expected to be acceptable in real cases.

IV. EXPERIMENTS

To verify the effectiveness of our dynamic calibration algorithm, we performed the following experiments over real scenes. In the first ex-periment, test images were captured by four cameras mounted on the ceiling. These four cameras kept panning and tilting while capturing images. In total, each camera captured 1000 test images, with the reso-lution of 3202 240 pixels. Besides, in order to evaluate the calibration results, we placed test landmarks in the scene with a 100-frame interval. That is, we capture 100 image frames; stop and place some landmarks in the scene; capture an image with the presence of landmarks; stop and remove these landmarks; and then resume image capturing for another 100 frames. This procedure was repeated till we captured all 1000 im-ages for every camera. Fig. 9 shows an example of captured imim-ages by these four cameras, with the presence of landmarks.

At the beginning of the experiment, the static calibration proposed in [17] was applied to calibrate the initial setup of these four cameras. The static calibration results are listed in Table III. The left part of Table III lists for each camera the estimated tilt angle and its altitude above the brown table in the scene. The right part of Table III lists for each camera the estimated position and orientation with respect to Camera-2. In addition, we also calculated the 3-D coordinates of the test landmarks and used them as a ground truth for the evaluation of our dynamic calibration algorithm.

As cameras began to pan and tilt, we extracted 50 prominent feature points from each of these four initial images and tracked these feature points by the KLT method. In our experiment, we fixed one of the four cameras. Based on (7) and (8), we performed dynamic calibration for every image pair.

Fig. 10. (a) Differences of the pan angles and (b) differences of the tilt angles between the dynamic calibration results and the static calibration results, with one of the cameras being fixed all of the time.

Fig. 11. Evaluations of dynamical calibration at the 1000th frame.

To evaluate the results of dynamic calibration, we performed static calibration at the period of every 100 frames, based on these images with the presence of landmarks. The result was verified by projecting the aforementioned 3-D landmarks onto the image plane of each camera. Fig. 10 shows the differences of the estimated pan angles and tilt angles between the dynamic calibration results and the static calibration results. Note that the static calibration results are performed based on the 3-D landmarks that have been well calibrated at the

(8)

Fig. 13. Evaluated corresponding relationship of the 1000th frame in the test sequence with a moving person.

TABLE III

RESULTS OF THESTATICCALIBRATION

beginning of the experiment. In Fig. 10, it shows that the deviation at the 1000th frame is still acceptable and is within the range of61:5. Besides, the differences do not gradually increase. Moreover, based on the results of dynamic calibration, we may also directly pick up a few landmark points in the image captured by Camera-2 and project them onto the other three images, as shown in Fig. 11.

We also test the situation when a moving object is present during the dynamic calibration process. Limited by our camera control system, we cannot simultaneously control four cameras in real time. Hence, we only allow two cameras to pan and tilt in this experiment. Again, we captured 1000 frames for each camera and Fig. 12 shows a sample of the captured sequence. In Fig. 13, we show the corresponding relation-ship of the 1000th frame based on our dynamic calibration result. This reasonable correspondence demonstrates the effectiveness and feasi-bility of our dynamic calibration algorithm.

V. CONCLUSION

In this paper, a dynamic calibration method is proposed that can be applied to a wide-range surveillance system with multiple cameras. This method does not require specific calibration patterns or compli-cated correspondence of feature points. It also allows the presence of

[2] M. Agrawal and L. S. Davis, “Camera calibration using spheres: A semi-definite programming approach,” in Proc. 9th IEEE Int. Conf. Comput. Vision, Oct. 2003, vol. 2, pp. 782–789.

[3] Z. Zhang, “Flexible camera calibration by viewing a plane from un-known orientations,” in Proc. 7th IEEE Int. Conf. Comput. Vision, Sep. 1999, vol. 1, pp. 666–673.

[4] G.-Q. Wei and S. D. Ma, “A complete two-plane camera calibra-tion method and experimental comparisons,” in Proc. 4th Int. Conf. Comput. Vision, May 1993, pp. 439–446.

[5] P. F. Sturm and S. J. Maybank, “On plane-based camera calibration: a general algorithm, singularities, applications,” in Proc. IEEE Comput. Soc. Conf. Comput. Vision Pattern Recogn., Jun. 1999, vol. 1, pp. 432–437.

[6] Z. Zhang, “Camera calibration with one-dimensional objects,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 7, pp. 892–899, Jul. 2004.

[7] P. Sturm, “Algorithms for plane-based pose estimation,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn., June 2000, vol. 1, pp. 706–711.

[8] T. Ueshiba and F. Tomita, “Plane-based calibration algorithm for multi-camera systems via factorization of homography matrices,” in Proc. 9th IEEE Int. Conf. Comput. Vision, Oct. 2003, vol. 2, pp. 966–973.

[9] C. Wiles and A. Davison, “Calibrating a multi-camera system for 3D modelling,” in Proc. IEEE Workshop on Multi-View Modeling Anal. Visual Scenes, Jun. 1999, pp. 29–36.

[10] A. Jain, D. Kopell, K. Kakligian, and Y.-F. Wang, “Using stationary-dynamic camera assemblies for wide-area video surveillance and se-lective attention,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Jun. 2006, vol. 1, pp. 537–544.

[11] G. Schweighofer and A. Pinz, “Robust pose estimation from a planar target,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 12, pp. 2024–2030, Dec. 2006.

[12] K.-T. Song and J.-C. Tai, “Dynamic calibration of pan-tilt-zoom cam-eras for traffic monitoring,” IEEE Trans. Syst., Man, Cybern., Part B, vol. 36, no. 5, pp. 1091–1103, Oct. 2006.

[13] C. T. Huang and O. R. Mitchell, “Dynamic camera calibration,” in Proc. Int. Symp. Comput. Vision, Nov. 1995, pp. 169–174.

[14] B. Zhang, Y. F. Li, and Y. H. Wu, “Self-recalibration of a structured light system via plane-based homography,” Pattern Recogn., vol. 40, no. 4, pp. 1368–1377, Apr. 2007.

[15] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE Conf. Comput. Vision Pattern Recogn., Jun. 1994, pp. 593–600.

[16] D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach. Englewood Cliffs, NJ: Prentice-Hall, 2003, pp. 216–217.

[17] I.-H. Chen and S.-J. Wang, “An efficient approach for the calibration of multiple PTZ cameras,” IEEE Trans. Autom. Sci. Eng., vol. 4, no. 2, pp. 286–293, Apr. 2007.