Dynamic Calibration of a Single Camera Based on Temporal Information

4.1 Dynamic Calibration of Multiple Cameras

4.1.2 Dynamic Calibration of a Single Camera Based on Temporal Information

Assume we have a set of PTZ cameras. At the beginning, we calibrate the 3-D pose of each camera via the static calibration method introduced in Chapter 3. After that, we allow each PTZ camera to pan and tilt freely.

As a camera starts to pan or tilt, its image content changes. To recalibrate the new pose of the camera, we check the temporal displacement of a few feature points in the

image content. Here, we use the KLT method [45] to extract and track feature points in consecutive images. We also assume all extracted feature points correspond to some unknown static points in the 3-D space.

Typically, we may assume the rotation radius r is far smaller than the distances between these 3-D points and the camera. We also assume the changes of pan angle and tilt angle are very small during the capturing of two successive images. With these two assumptions, the projection center of the camera can be thought to be fixed with respect to the 3-D points while the camera is panning or tilting. In other words, the projection lines, which connect the projection center to each of these observed 3-D points, are fixed in the 3-D space, as long as these 3-D points stay static during the capture of images. By using these projection lines as a reference, we may recalibrate the new pose of the camera. Moreover, as illustrated in Fig. 4.2, if three 3-D points, PA, PB and PC, are replaced by another three points, P^ˆ_A, P^ˆ_B and P^ˆ_C, on their projection lines, there is no influence on the projected points on the image plane.

Hence, even if we do not actually know the real locations of these 3-D points, we may simply back project all feature points in the image onto a 3-D pseudo plane with a constant Z coordinate, as shown in Fig. 4.2.

In our approach, based on a few feature points on a pair of successive images It-1

and It, we first back project these feature points in It-1 onto a 3-D pseudo plane with a constant Z. Then, we try to find a new pose of the camera that can map the corresponding feature points in It onto the same 3-D pseudo points. That is, if we assume the camera has the pan angle θ t-1 and the tilt angle φ t-1 while capturing It-1,

In (4.4), ˆB represents the back projection function of an image feature point onto a pseudo 3-D plane Π’. Here, we especially use “hat” to denote that the back-projection is restricted to a vertical pseudo plane Π’. Besides, p_k denotes a feature point in It-1

and pˆ_kdenotes the same feature point in It. K is the total number of image feature points for calibration. Note that in (4.4), we ignore the altitude parameter h of these back-projected points. This is because the altitude h can be obtained from (4.3) once if the Z coordinate is fixed. We also ignore the intrinsic parameters Ω since they are not changed when the camera pans and tilts.

Fig. 4.2. Illustration of a pseudo plane Π’.

4.1.3 Dynamic Calibration of Multiple Cameras Based on Epipolar-Plane Constraint

In the previous section, we assume the projection center of a single camera is fixed during panning and tilting. The projection lines are then used as a reference to calibrate the new pose of that camera. To further increase the accuracy of calibration, we add on the 3-D spatial relationship among cameras.

In Fig. 4.3, we show the epipolar geometry for a pair of cameras [3, pp. 216-219].

For these two cameras, their projection centers, OC1 and OC2, together with a 3-D point PA, determine an epipolar plane ∏. This epipolar plane ∏ intersects the image planes of the cameras to form two epipolar lines l1 and l2. If pA1 and pA2 are the

In (4.5), we use the B(.) function defined in (4.4). Note that we ignore the altitude parameter h because the formation of epipolar plane is actually independent of h. That is, no matter what value h is, the epipolar plane is still the same.

Fig. 4.3 Illustration of epipolar-plane constraint.

On the other hand, if some other points lie on the same pair of epipolar lines, like pB1 and pC1 on l1 and pD2 and pE2 on l2, the back-projected points of these points also have to lie on the same epipolar plane ∏. Traditionally, when we deal with the calibration of this camera pair, we try to figure out the pair-wise correspondence between {pA1, pB1, pC1} and {pA2, pD2, pE2}. If we may place some pre-defined calibration patterns or landmarks in the 3-D scene, the correspondence of feature points can be easily achieved. However, in real cases, especially when cameras are panning and tilting all the time, calibration patterns or landmarks may get occluded or move out of image scopes.

If we do not have calibration patterns or landmarks with us, one possible way to achieve dynamic calibration is to automatically extract new feature points from the image contents and use them as pseudo landmarks. However, this kind of approach requires point-wise correspondence between each image pair and this point-wise correspondence has long been recognized as a cumbersome problem in computer vision, especially when a lot of feature points are involved. Moreover, for a wide-range video surveillance system, the image contents of different cameras may be very different. In this case, the correspondence of image feature points among different cameras is even more difficult.

In this section, we adopt a different approach to avoid the troublesome point-wise correspondence. As illustrated in Fig. 4.3, we assume a pair of camera has initially been calibrated via some kind of calibration algorithm. We assume a few features, like pA1, pB1, pC1, pA2, pD2, and pE2, are located on a pair of corresponding epipolar lines.

Without performing point-wise correspondence, we do not actually know where these feature points are projected from. However, we are still confident of the fact that these 3-D points must be “somewhere on the epipolar plane”. As long as these 3-D points remain static in the 3-D space, this epipolar plane is fixed. Hence, the epipolar planes

that have been identified at the previous moment can be used as a reference for the calibration of cameras at the current moment.

In Fig. 4.3, we assume a pair of cameras has been calibrated at the time instant t-1 and an epipolar pane ∏ has been identified. Assume at that time instant t-1, the pan and tilt angles of Camera-1 are θ¹t-1 and φ¹t-1, while the pan and tilt angles of Camera-2 are θ²t-1 and φ²t-1. Camera-1 captures the image I¹t-1, while Camera-2 captures I²t-1. On the other hand, at the time instant t, Camera-1 rotates to a new pan angle (θ¹t-1+∆θ¹t) and a new tilt angle (φ¹t-1+∆φ¹t), while Camera-2 rotates to (θ²t-1

+∆θ²t) and (φ²t-1+∆φ²t). Here, we only discuss the calibration of Camera-1. The calibration of Camera-2 can be implemented in a similar way.

For Camera-1, assume a prominent feature point p¹_A has been extracted from

I¹t-1. This feature moves to pˆ¹_A in I¹t. Based on p¹_A, θ¹t-1, and φ¹t-1, we may form an

Note that in (4.6) and (4.7), the projection center OC2 may have a slight movement when Camera-2 rotates. That movement can be taken into account to achieve more accurate calibration. Here, we simply ignore that part to simplify the formulation.

For Camera-1, assume we have extracted m epipolar lines. Moreover, on the jth epipolar line, where j = 1, 2, …, m, we have extracted nj feature points

1 1 1

,1 ,2 ,

{p_j ,p_j ,...,p_{j n}_j} on I¹t-1. These nj feature points move to {pˆ¹_j_,1,pˆ¹_j_,2,...,pˆ¹_{j n}_, _j} on

I¹t. Besides, we assume p¹_j denotes one of the feature points in {p¹_j_,1,p¹_j_,2,...,p¹_{j n}_, _j}. Based on the epipolar-plane constraint, we can estimate the optimal ∆θ¹t and ∆φ¹t that minimize

Furthermore, by integrating (4.4) and (4.8), the changes of pan angle and tilt angle of Camera-1 can be estimated by minimizing the following formula:

1 1 1 1 1 1 1 1 1 2

Similarly, the changes of pan angle and tilt angle of Camera-2 can be estimated by minimizing

Here, λ is a parameter to weight the contributions of temporal clues and 3-D spatial clues. In our experiments, we simply set λ = 1. In theory, for each camera, one feature point is sufficient for the first right term of (4.9) or (4.10) to solve ∆θ t and ∆φ t. That term assumes the [X, Y, Z] coordinates of a back-projected point is fixed when a camera is under panning or tilting. Since each 3-D point is with a fixed Z coordinate, a feature point may offer two constraints over the X and Y coordinates and these two constraints can be used to solve ∆θt and ∆φt. On the other hand, whenever a pair of epipolar lines can be determined, any feature point on the epipolar lines can be used

for the second right term of (4.9) or (4.10) to make the estimation more robust.

To deduce ∆θ¹t, ∆φ¹t, ∆θ²t, and ∆φ²t, we adopt the Levenberg-Marquardt (LM) algorithm. In our experiments, the initial guesses of pan/tilt angle changes are set to be zero degrees. Note that for a pair of corresponding epipolar lines, Camera-1 and Camera-2 may have very different numbers of feature points. That is, the nj in (4.9) may be different from the nj in (4.10). This is because we do not actually seek to perform the correspondence of feature points. Instead, we seek for a consistent matching of epipolar lines between It-1 and It. This strategy greatly simplifies the correspondence problem. Moreover, Formulae (4.9) and (4.10) can also be merged together into a single formula during the optimization process.

(a)

(b)

Fig. 4.4 Image pairs captured at two different time instants. Green lines indicate three pairs of corresponding epipolar lines.

In summary, for the proposed dynamic calibration algorithm, we perform the following steps.

Step 1. After the setup of multiple cameras, we perform static camera calibration based on the method introduced in Chapter 3. After that, cameras are allowed to pan and tilt freely.

Step 2. On each image, a few feature points are extracted and tracked based on the KLT algorithm [45]. Feature points moving out of the image are removed, while new feature points entering the image are added.

Step 3. For each pair of cameras, based on the previous calibration results, we generate pairs of epipolar lines that pass through these extracted feature points.

Actually, as long as a feature point is within a predefined distance from an epipolar line, we say that feature point is passed through by the epipolar line.

Step 4. Based on the extracted feature points and the information of epipolar lines, we calibrate the new pan angle and tilt angle for each pair of cameras by optimizing (4.9) and (4.10). After that, go back to Step 2.

The above procedure is repeated to acquire the new poses of all cameras. In Fig.

4.4(a) and (b), we show images captured by two different cameras at two different time instants, overlapped by three pairs of epipolar lines. Note that even though the feature points on these epipolar lines come from different 3-D points, we may still be able to achieve reliable dynamic calibration based on the matching of epipolar lines.

4.2 Dynamic Calibration with Presence of Moving Objects

So far, we have assumed all the feature points used for calibration correspond to some fixed 3-D points in the scene. However, in real applications, such as object tracking or 3-D positioning, people or moving objects may enter or leave the scene while cameras are capturing images. To guarantee accurate calibration, we need to get rid of these feature points related to moving objects.

In Fig. 4.5, we show two successive image frames where the camera tilts up by 0.5-degrees. For each feature point, we calculate its spatial displacement (dx, dy). The distribution of (dx, dy) is plotted in Fig. 4.6, where most displacements cluster around (0, -4). These clustered displacements correspond to the movements of static feature points caused by camera rotation. On the other hand, there exist some outlier displacements which correspond to the movement of feature points lying on the moving person.

(a)

(b)

Fig. 4.5 (a) Image captured by a camera with 55.1^o tilt angle. (b) Image captured by a camera with 54.6^o tilt angle. Red crosses represent feature points extracted by the KLT algorithm.

Fig. 4.6 The distribution of spatial displacement for the extracted feature points in Fig. 4.5.

However, the displacement of feature points depends not only on the pose of camera but also on the contents inside the 3-D scene. Theoretically, by taking the partial derivative of (4.2) with respect to the pan angle θ, we have (4.11), which indicates how the location of a feature point varies with respect to the change of pan angle. To simplify the formula, we assume φ = 0 to ignore the influence of tilt angle. The simplified formula is expressed in (4.12). Similarly, by ignoring the effect of pan angle, (4.13) indicates how the location of a feature point varies with respect to the change of tilt angle. Both (4.12) and (4.13) indicate the crucial role of the 3-D location (X, Y, Z) in the displacement of feature points. Hence, for different scenes, we expect different degrees of feature point displacements.

Furthermore, we illustrate the term Xcosθ-Zsinθ-rsinθ and Xsinθ+Zcosθ +r(cosθ-1) of (4.12) in Fig 4.7. Assume there is a 3-D point P with the world coordinates [X, Y, Z]^T. In Fig 4.7, when the camera rotates with a pan angle θ, its projection center Oc moves to Oc’ and the world coordinates (X, Y, Z) changes to (X’, Y’, Z’). The term Xcosθ-Zsinθ-rsinθ represents the distance between P and Z’ axis, while the term Xsinθ+Zcosθ +r(cosθ-1) represents the distance between P and X’ axis.

In other words, from the view of a camera, (4.12) depends on the relative positions between the observed objects and the projection center. Formula (4.12) can also be

expressed as centimeter level (about 3.5 centimeters), while most of the observed scenes are away from the cameras with a meter level. The situation about tilt angle is similar to that about pan angle. Hence, we may simply dismiss r here.

Fig. 4.7 Illustration of the coordinate system when camera is panning. If r is far smaller than Z’, we may simply dismiss r.

(a)

(b)

Fig. 4.8 (a) The displacements of feature points observed by two different cameras. Both cameras are under a 1-degree pan-angle change, while their tilt angles are fixed at 34.8^o. (b) The displacements of feature points observed by the same camera but with different pan-angle changes. (Blue: 0.6-degree pan-angle change.

Red: 1-degree pan-angle change.)

Figure 4.8 shows two simulation results that demonstrate the effects of 3-D scene and camera pose over the value of displacement. In Figure 4.8(a), we plot the

displacement of feature points observed by cameras at two different locations. Both cameras are under a 1-degree change of pan angle, while their tilt angles are fixed at 34.8^o. Due to the different observed scenes, the displacements of feature points are different. On the other hand, Figure 4.8(b) shows the displacement observed by the same camera but with two different pan-angle changes. It can be observed that not only the displacement magnitudes are different; the distributions of displacement are also different. The distribution with a smaller pan angle change is more compact.

Since the distribution of the displacement highly depends on the observed scene and the magnitude of angle change, we obtain the characteristics of displacement via a learning process for each camera. In the training stage, we intentionally pan and tilt each camera to capture a sequence of images, without the presence of moving objects.

In our experiments, four cameras are used and Fig. 4.11(a) shows an example of images captured by these four cameras. In Fig. 4.9, we show the x-component displacement of feature points with respect to the change of pan angle for each of our four cameras. It can be observed that Camera-1 and Camera-3 have roughly the same statistical behaviors, while Camera-2 and Camera-4 have similar behaviors. In Fig.

4.10(a), we further plot the relationship between the standard deviation of dx and the median of dx when cameras are under panning. Again, Camera-1 and Camera-3 have roughly the same statistical behaviors, while Camera-2 and Camera-4 have similar behaviors. Even though different cameras may have very different statistical behaviors, the relationship between the standard deviations of dx and the median of dx is roughly linear for each camera. Similarly, Figure 4.10(b) shows the statistical relationship between the standard deviation of dy and the median of dx. On the other hand, for the tilting case, we also observed similar statistical behaviors between the standard deviation of dx (or dy) and the median of dy. All these statistical relationships offer useful knowledge about the displacement of feature points when

the 3-D scene is stationary.

When moving objects are present, these feature points caused by the moving objects usually have very different statistical behaviors. Hence, in the dynamic calibration process, we may calculate the median of displacements for all feature points. Based on the median, we estimate the standard deviation of displacement according to these already learned statistical relationships. When the displacement of a feature point is away from the median by three standard deviations, that feature point is treated as an undesired feature point and is discarded in the dynamic calibration process.

(a) (b)

Fig. 4.9 The x-component displacement of feature points with respect to the changes of pan angle for four different cameras, without the presence of moving objects. The statistical relationships for Camera-1, Camera-2, Camera-3, and Camera-4 are plotted in red, blue, green, and magenta, respectively.

(a)

(b)

Fig. 4.10 (a) Standard deviation of dx with respect to the median of dx when cameras are under panning. (b) Standard deviation of dy with respect to the median of dx when cameras are under panning.

4.3 Sensitivity Analysis

Based on (4.9) and (4.10), we can dynamically estimate the changes of pan angle and tilt angle while a camera is rotating. In this section, we will analyze how sensitive our algorithm is with respect to the calibration errors at the previous time instant and the measurement errors at the current time instant. Here, we assume there could be some errors in the calibration results at the previous time instant t-1. Moreover, there could be some errors in the extraction of feature points, including tracking errors and the departure of feature points from the epipolar lines.

Without loss of generality, we only discuss the sensitivity of our algorithm in the dynamic calibration of Camera-1. In theory, for the estimation of ∆ and θ_t¹ ∆ , the φ_t¹

optimization of (4.9) conforms to 0 )

in (4.9), the projection center OC2 actually has a slight movement when Camera-2 rotates. This is because the rotation center is not exactly the same as the projection center. To simplify the formulation of (4.9), we intentionally ignored that part in Section 4.1. However, in the implementation of our algorithm, we actually had taken this fact into account to achieve more accurate calibration. Hence, in the following analyses, f1 and f2 depend not only on θ_t¹₋₁ and φ_t¹₋₁, but also on θ_t²₋₁ and φ_t²₋₁. On

1 1

Similarly, we can deduce the formulae for

To verify the above formulae, we perform the following simulations. Here, two cameras are assumed to have been accurately calibrated. Camera-1 is hung at a height 2.06 meters. If Camera-1 is translated by -0.69, -0.13 and 6.25 meters along X-, Y- and Z-axis, respectively, and then rotated by -143.64 degrees about Y-axis, Camera-1 will coincide with Camera-2. At first, Camera-1 has the pan angle θ₀¹= and tilt angle 0

0 20

φ = , while Camera-2 has the pan angle θ₀² = and tilt angle 0 φ₀² =40. Moreover, based on the rectified world coordinate system of Camera-1, we assume there is an epipolar plane Π with the homogeneous coordinates π = [0.63, 0.77, 0.09, 0.01].

Based on this plane Π, we deduce the corresponding epipolar lines on the image planes of these two cameras. On each of these two epipolar lines, we randomly choose

three image points {p_1,1^k ,p_1,2^k ,p_1,3^k }as the feature points, with k = 1 or 2. After that, the tilt angle of Camera-1 is changed to 20.5 degrees so that the feature points on the image plane of Camera-1 will move to the new positions {pˆ_1,1¹ ,pˆ_1,2¹ ,pˆ_1,3¹ }. Besides, the intrinsic parameters {α1, β1, α2, β2} are set to be {392, 388, 392.3, 385}.

In the simulation, we change individually the initial pan and tilt angles {θ ,₀¹ φ ,₀¹ θ ,₀² φ }of Camera-1 and Camera-2 to see how the estimated values of ₀² ∆ θ₁¹

在文檔中多台攝影機之靜態與動態校正技術 (頁 90-0)