PTZ Camera Network Reconfiguration
3.4 Simulations and A Real World Experiment
To compare the performances of different approaches, we conducted three computer sim-ulations and a real experiment. Throughout the four experiments, the following simple quality function is adopted. When the lens is fully zoomed out, zij,h= 0. Conversely, zij,h= 1 indicates that the lens is
fully zoomed in. Therefore, λ zij,h≤ 0.01. The quality function defined in (3.14) indicates that our goal is to control the PTZ cameras to capture as many as possible the targets appearing at the area under surveillance while favoring high zoom level camera settings.
The goal can be applied to real-world applications. For example, the most common ap-plication is to record all targets’ trajectories in the scene. Then, when an event occurred, the recorded trajectories can be utilized to trace a target (for example, a criminal) and to reconstruct his escaping route. For simplicity, we assume that the image resolution of any static camera included in the surveillance network is too low that the image quality function always returns zero. Hence, the optimal solution will be a control strategy that uses the PTZ cameras to capture the highest number of targets.
In the first two experiments, we simulated a surveillance system containing three vir-tual PTZ cameras, a virvir-tual static camera covering the entire area under surveillance, and several virtual pedestrians with different paths and movement speeds. In total, we pro-duced five pedestrian sets each containing 15 moving objects. Figure 3.2 shows the sim-ulation environment in which the total coverage of each PTZ is depicted as a fan-shaped region and a moving target is represented as a trajectory with time ordering.
We compared the performance of the proposed non-linear objective function LPG (NOF-LPG) method with that of the following four methods: the linear sum method, i.e., the linear objective function LPG (LOF-LPG), which maximizes Equation (3.5); Song et al.’s method (SONG) [43]; Lim et al.’s method (LIM) [40]; and the exhaustive search (ES) method. To find an optimal solution, the ES algorithm tests all possible combinations
∏ni=1wi of the feasible FOVs (refer to Equation 3.1)) of all the cameras. The small-scale tests enable us computing an ES solution as a reference to compare different approaches.
The simulated system was implemented in C++ on a PC with an Intel Core 2 DUO E8400 3.00Ghz CPU.
The first simulation compares the efficiency and accuracy of the five algorithms on the five pedestrian sets. The average computation times using NOF-LPG, LOF-LPG,
3 2
1
Figure 3.2: The simulated surveillance environment with three PTZ cameras and fifteen moving targets in Pedestrian set 1
SONG, LIM, and ES are shown in Table 3.5. The computational load of NOF-LPG is exponentially proportional to the maximum number of targets in an FOV, i.e.,
Tji
(refer to Section 3.3) but the number of targets in an FOV is normally small. Therefore, the efficiency of NOF-LPG is acceptable. In the simulation, the maximum number of targets in an FOV is five, so the computation time of the proposed method is satisfactory.
Figure 3.3(a) shows the true number of moving objects in pedestrian set 1 and the number observed by ES. Note that the cameras may not be able to observe all the moving objects simultaneously because of the limitations of their FOVs. Therefore, the numbers of observed targets derived by the ES algorithm are used as a benchmark. In the sim-ulation, the positions of all pedestrians are provided by the global-view static camera.
Figure 3.3(b) shows the performances of the four compared methods using pedestrian set
Table 3.5: The average computation time required for one iteration by the NOF-LPG, LOG-LPG, SONG, LIM, and ES methods
Average computation time Average computation time in the first simulation in third simulation
NOF-LPG 334.14 ms 1115.72 ms
LOF-LPG 247.76 ms 1112.89 ms
LIM 236.33 ms 1073.44 ms
SONG 245.01 ms 1086.99 ms
ES 2330.37 ms O(4530)
Table 3.6: The percentages of targets observed by the compared methods
Pedestrian set 1 Pedestrian set 2 Pedestrian set 3 Pedestrian set 4 Pedestrian set 5
NOF-LPG 100.0% 100.0% 100.0% 100.0% 100.0%
LOF-LPG 96.4% 93.7% 93.9% 93.6% 93.2%
LIM 75.6% 71.7% 84.6% 77.9% 77.0%
SONG 73.4% 76.0% 66.3% 72.4% 68.4%
1, where all data are normalized by the corresponding reference values (the result of ES);
and Table 3.6 details their performances on the five pedestrian sets. The NOF-LPG and LOF-LPG methods outperform the other two methods. The results show that the number of targets observed using NOF-LPG is equal to that of the ES algorithm. Furthermore, the LOF-LPG method outperforms the SONG and LIM methods because it utilizes a cen-tralized optimization technique. The proposed NOF-LPG method outperforms the other three methods because it adopts a better objective function.
As mentioned in Section 3.2.2 that the linear summation function (3.5) encourages all cameras to pursue high video quality targets without a mechanism to suppress too many cameras focusing on the same group of targets. Therefore, the LOF-LPG method may lose track of some targets. Figure 3.4 shows an example of this effect found in the first simulation at time instance 21. This figure shows that target 4 is ignored by the LOF-LPG algorithm because the sum of the quality values of targets 2 and 6 is greater than that of target 4. Conversely, the NOF-LPG appropriately controls cameras’ FOVs to obtained more targets.
(a)
(b)
Figure 3.3: (a) The true number of moving objects in Pedestrian set 1 and the number observed by the exhaustive search (ES) method; (b) the normalized numbers of targets observed by the compared methods (Pedestrian set 1).
In the second simulation, we assess the effects of two types of noise that are inevitable in a real environment. The first type of noise is caused by target detection errors, which yield an incorrect number of targets; and the second type is target location errors caused by target tracking and/or prediction errors. The miss detection rate is set at 10% in this simulation, and the target position is perturbed by zero mean white Gaussian noise with a standard deviation of 0.5 meters. In addition, a 20% gross location error is generated by a random walk whose step size and orientation angle are uniformly distributed within
Figure 3.4: The snapshot of NOF-LPG and LOF-LPG at time instance 21.
Table 3.7: The percentages of targets observed by the compared methods in noisy pedes-trian sets
Pedestrian set 1 Pedestrian set 2 Pedestrian set 3 Pedestrian set 4 Pedestrian set 5
NOF-LPG 92.5% 88.8% 91.0% 87.0% 89.9%
LOF-LPG 91.0% 87.1% 85.8% 86.1% 85.4%
LIM 72.9% 64.0% 75.1% 69.0% 71.5%
SONG 68.5% 66.5% 67.1% 61.3% 65.1%
used to compute the reference values are the same as those in the first simulation. The results of the second simulation, shown in Table 3.7, demonstrate that incorrect target information does degrade the decision quality. Even so, NOF-LPG still outperforms the other methods.
In the third experiment, we simulated a large scale surveillance system with 30 virtual PTZ cameras and 45 virtual pedestrians. Figure 3.5 shows the simulation environment.
In the simulation, we tested the performance of different approaches under two different conditions: 1) the positions of the pedestrians are provided by a global-view static camera and 2) the positions of the pedestrians are estimated by the PTZ cameras. In the latter case, the positions of those pedestrians which are located outside the FOVs of all the cameras will be not available. Therefore, the control strategy is made with insufficient information
and a degradation of tracking accuracy should be expected. Since the exhaustive search of this large-scale simulation is impractical due to the high computational complexity O(4530), the number of tracked targets compute by NOF-LPG with a global-view static camera is used to replace the ES result as a benchmark. Table 3.8 shows the results of computer simulation. In this large-scale surveillance simulation, NOF-LPG achieved the best performance. Furthermore, when a global-view static camera is not available, 10%
drop of the tracking performance is observed for every method tested in this experiment.
Moreover, since the SONG method adjusts one camera at a time, each camera is adjusted every 30 iterations in this simulation. Hence, its performance is further degraded. The average computation time of the third simulation is shown in Table 3.5. The computation time of each method is approximately of the same order. In this simulation, the maximum targets appeared in any of the camera FOVs is only five. Therefore, the computation time of the proposed NOF-LPG method is slightly longer than that of the other methods.
Figure 3.5: The scene involved with 30 virtual PTZ cameras in the third simulation.
We also conducted an experiment using real-world data to evaluate the performance of the four methods. A camera network comprised of two PTZ cameras (AXIS AX-215) and
Table 3.8: The percentages of targets observed by the compared methods in a large scale scene.
Performance with a global fixed camera Performance without any fixed camera (normalized by the result of (normalized by the result of NOF-LPG with a global fixed camera) NOF-LPG with a global fixed camera)
NOF-LPG 100.0% 90.0%
LOF-LPG 98.1% 87.5%
LIM 96.8% 73.0%
SONG 45.5% 35.6%
one fixed camera (D-link DCS-3220G) was deployed in an outdoor environment. There are quite a few automatic methods to estimate the image homographies, such as [58] and [59]. However, since estimating a homography only requires four pairs of correspondence points, it is not difficult to manually construct the correspondence points. Therefore, in the experiment, all cameras were calibrated by selecting landmarks manually and the homography map was computed by using functions in OpenCV [60]. The video resolution of each FOV was 352 × 240. At the system setup stage, 24 predetermined FOVs were assigned to each PTZ camera. To calibrate the homographies of the predetermined FOVs, the images corresponding to the 24 FOVs were acquired and used to build a panoramic image of the PTZ camera with Autostitch [61]. Therefore, the homography between any of the 24 FOVs and the corresponding panoramic image is known. At present, the 24 FOVs only contain two zoom levels because, if the FOV is too narrow, the image features would be insufficient (feature points < 4) to estimate the image homography reliably. The panoramic images of the two PTZ cameras and the images of the fixed camera are related by registering them to a top-view aerial image. Figure 3.6(a) shows the set of 24 acquired images and the registered panoramic image; and Figure 3.6(b) shows the visual coverage regions of the three cameras overlaid on a top-view image. Therefore, the homography between any two cameras is known.
The video from the fixed camera is used for target detection and prediction; and Yoo and Park’s difference-based approach [62] is utilized to detect moving targets. Although the approach in [62] may fail to detect semi-stationary targets, it is efficient and robust
(a)
(b)
Figure 3.6: (a) A panoramic image derived by integrating 24 images; (b) the coverage regions of the three cameras overlaid on a top view image and the panoramic image of the three cameras.
to lighting variations; thus, it is very suitable for an outdoor real time system. A simple constant-velocity motion model is used to predict the locations of targets. Figure 3.7 shows a snapshot of the targets detected in the video stream of the fixed camera.
Since it is impossible to control the PTZ cameras by using the four compared methods simultaneously, we tested them one by one. For comparison, each algorithm was tested for 20 minutes (about 5400 frames) on a cloudy and windy day. At each iteration, the system required about 0.8 seconds to estimate the parameters of the constant-velocity model of each moving target in order to predict its next position. Then, new parameters were determined and sent to the PTZ cameras. Usually, the cameras took 0.6 seconds to move
Figure 3.7: Detected targets (a single person, a group of people, a car, and noise) to the assigned FOV. Finally, the targets observed by the FOVs were counted to evaluate the performance. Figure 3.8 shows the NOF-LPG results at three time instances. In video frame 1470, several groups of people are detected and the two PTZ cameras are instructed to observe different groups of targets for maximizing the observed target number. Video frames 3358 and 3444 exemplify a target hand-over between cameras, where the group observed by camera 2 in frame 3358 is passed to camera 3 in frame 3444. As in the simulations, the exhaustive search method is applied to obtain reference data. However, since it is impractical to test all combinations of the cameras’ FOVs when determining the optimal solution in a real environment, the exhaustive search is based on the video recorded by the fixed camera. Each feasible FOV of the PTZ cameras is mapped to a region in the image of the fixed camera. Then, all combinations of the mapped regions are utilized to search for the maximum number of observed targets as reference data. Figure 3.9 shows the corresponding reference data of the four video sequences used to test the NOF-LPG, LOF-LPG, LIM and SONG methods; and Table 3.9 details the performances of the four algorithms in the real-world experiment.
Although the performance of each algorithm is affected by shadows, wind, calibration errors, target detection errors, and prediction errors, the results demonstrate that NOF-LPG and LOF-NOF-LPG still outperform LIM and SONG.
Figure 3.8: The tracking results of frames 1470, 3358, and 3444 using NOF-LPG.
Table 3.9: The percentages of targets observed by the compared methods in a real envi-ronment
Performance (normalized by the result of exhaustive search in the fixed camera)
NOF-LPG 71.0%
LOF-LPG 67.3%
LIM 60.2%
SONG 62.3%
3.5 Concluding Remarks
In this chapter, we have considered the process used to assess the quality of a set of FOVs and proposed a non-linear objective function to reduce the number of unattended targets in a surveillance region. When the quality of each FOV is evaluated individually, it is easy to derive the objective functions of different PTZ network problems. The proposed approach provides an optimal solution for the PTZ network coordination problem. We have also shown that the non-linear optimization problem can be transformed into a lin-ear production game problem that is guaranteed to yield an optimal solution. The optimal solution of LPG can be computed in polynomial time so our approach is efficient. The
Figure 3.9: The average number of targets identified by the ES algorithm in the video sequences used to test the NOF-LPG (video sequence 1), LOF-LPG (video sequence 2), LIM (video sequence 3) and SONG (video sequence 4) methods.
branch-and-cut method is adopted to solve the PTZ parameters selection problem. Com-puter simulations and a real-world experiment were performed to evaluate the proposed method in a multi-target surveillance environment. The results show that the method achieved the highest tracking rate among the compared methods.