Chapter 4. Experiments
4.2 Feature Selection
To choose a proper feature descriptor for multi-camera vehicle identification, HSTunnel_NO_MISS dataset is selected to evaluate the distinctiveness of each feature described in Section 3.3. All vehicles in the dataset exist in all cameras. That is, no miss detection occurs in this dataset. First, we choose two out of five cameras in HSTunnel_NO_MISS dataset and calculate the 124*124 visual distance matrix. Next, for each row of the distance matrix, we can obtain the rank of the vehicle (row). Rank
42
𝑖 means the corresponding vehicle in the next camera (column) is the 𝑖th smallest distance value in the row. For example, rank 1 means the corresponding vehicle in the next camera has the smallest distance value with respect to all others in the row, which is the best result. Rank 2 means the correct vehicle has the second smallest distance value, and so on. Therefore, the HSTunnel_NO_MISS contains five cameras and each feature can obtain ten execution results by exhaustively choosing two
For the Haar-like feature [11] used in feature extraction, we manually collect 900 grayscale images of vehicles from eight tunnel videos as positive set, which is different from dataset HSTunnel, and 1800 negative samples randomly sampled from 20 background images inside tunnels without vehicles. The resolution of training images is 24*24, where the resolution of our video is 352*240 pixels. The number of cascading stages is 10 and the total number of weak classifiers obtained from the training algorithm is 143. The trained detector is only used in the experiments of feature selection.
43
Figure 4-3. Average performance on different feature descriptors.
Table 4-2. Average performance on CMC values of different feature descriptors.
Feature Descriptor CMC(1) CMC(3) CMC(5) CMC(10)
Image Intensity 32 44 50 60 rank values, and Table 4-2 shows the CMC values with some rank value. The selected features are image intensity, Haar-like feature vector, color histograms, and keypoints descriptors. For color histograms, RGB, hue, and opponent color space are used in the
0
44
experiments. And for keypoints descriptors, SIFT, ORB, and OpponentSIFT are tested.
We can clearly observe that OpponentSIFT outperforms other feature descriptors in Figure 4-3. The result is similar to the recommendation in [24]. In conclusion, we choose OpponentSIFT as our feature descriptor.
For tunnel surveillance, OpponentSIFT outperforms SIFT since OpponentSIFT considers color information but SIFT does not. As described in the previous section, the color and structure information of a vehicle are powerful visual information that should be considered.
All color histogram descriptors perform poor in the experiments. Color is a strong feature in vehicles. However, many vehicles are mostly in the same color, especially in our experiments that we look at 124 vehicles at the same time. Therefore, using only color information is not sufficient for multi-camera identification.
Finally, the Haar-feature vector used in [9] cannot achieve good result in CMC value of the experiment. It is used in vehicle classifier, which means all vehicles share the similar characteristics. The Haar-feature vector we trained contains only 148 dimensions with binary values, and it is not sufficient to describe all vehicles since most of the values are the same.
4.3 Multi-Camera Vehicle Groups Matching
This section presents the experiments on the multi-camera vehicle groups matching algorithms. By using the OpponentSIFT as feature descriptor, the assignment algorithms can be applied to match vehicles from two cameras.
45
Assume that we have 𝑁1𝐶 vehicles in the first camera and 𝑁2𝐶 vehicles in the second camera and 𝑁2𝐶 ≤ 𝑁1𝐶. In the following experiments, we set 𝑁1𝐶 as 50 and 𝑁2𝐶 from 1 to 50. Considering the order constraint in tunnels, the larger 𝑁2𝐶 is, the higher accuracy can be achieved. For each vehicle in the second camera, a vehicle in the first camera is assigned to it or is assigned as “no-match” if all candidates are not suitable. The accuracy is the percentage of correct assignments over all 𝑁2𝐶 vehicles in the second camera.
Figure 4-4 shows an example of assignment result. The first two rows are the detected ID of vehicles in capital letters, and the third row is the output from some algorithm. For every vehicle in C2, the algorithm chooses one best-match vehicle from C1. The first four results are correct. The 3rd result is correct because vehicle C is not in camera C1 and the algorithm assigns a no-match to it. Finally, the last two results are examples of incorrect assignment: vehicle F in C2 matches vehicle E, and vehicle G in C2 is claimed as no-match in C1 but vehicle G does exist. Therefore, the accuracy of this example is 4/6 = 67% . The experiments on these two cameras are represented as (C1, C2), where C1 is the first camera and C2 is the second camera.
Figure 4-4. Example of an assignment result on (C1, C2). The first row is the candidate queue contains detections in camera C1, the second row in the second
camera C2, and the third row is an example of execution result.
46
The experiments proceed as follows. First we randomly set a number as the starting index in two cameras, the following 𝑁1𝐶 vehicles in the first camera and 𝑁2𝐶 vehicles in the second camera are used in the experiment. It is necessary because the order of vehicles in our dataset cannot change, and the number of vehicles is greater than 𝑁1𝐶. We randomly select 15 starting index and run 15 times independently, and take the average accuracy as the final result. After that we increase the value of 𝑁2𝐶 by one, and randomly execute 15 times again. Finally 𝑁2𝐶 is tested from 5 to 50, and 𝑁2𝐶 is incremented by one after each iteration.
Three methods are evaluated in the following experiments: S2DP, S2DP without NS2 penalty, and Hungarian algorithm. To evaluate the effectiveness of NS2 penalty, we include the result that NS2 penalty 𝜆 is set to zero. As a common solution of assignment problems, the Hungarian algorithm is selected as the baseline for performance comparison. The miss-match penalty 𝜖 in S2DP is set to 450 in all experiments, where discussions on the value of 𝜖 are presented in Section 4.5.1.
Both HSTunnel and HSTunnel_NO_MISS are tested in the experiments.
Figure 4-5 and Figure 4-6 show the results on every camera setting in HSTunnel and HSTunnel_NO_MISS, respectively. The x-axis is the value of 𝑁2𝐶 and the y-axis is the corresponding accuracy. In HSTunnel, all methods perform poor in camera (C3, C5) and (C4, C5) when 𝑁2𝐶 is small. Table 1 shows that camera C5 contains 173 vehicles, whereas camera C3 contains only 148 vehicles. In other words, in total 25 vehicles in the second camera (C5) cannot find corresponding candidates in the first camera (C3) because of miss detections. Therefore, the performance on (C3, C5) may decrease if the number of vehicles in second camera is not enough, same as in (C4, C5).
Camera (C2, C3) does not have such problem even there are 47 vehicles miss in C3
47
since miss-detected vehicles in C3 are not candidates. The algorithms do not execute on miss-detected vehicles in the second camera, as illustrates in Figure 4-4, thus the performance on (C2, C3) does not have significant differences compared with (C3, C5).
Figure 4-6 on HSTunnel_NO_MISS does not have this effect since there is no miss detection in HSTunnel_NO_MISS.
Figure 4-7 shows the results on HSTunnel and HSTunnel_NO_MISS. The x-axis is the value of 𝑁2𝐶 and the y-axis is the accuracy. All methods achieve higher accuracy in HSTunnel_NO_MISS than HSTunnel since there is no miss detection in HSTunnel_NO_MISS. The Hungarian algorithm does not work well because the visual feature is not robust enough to provide sufficient information. With NS2 penalty, the S2DP algorithm can achieve higher accuracy when 𝑁2𝐶 is below 45 in HSTunnel_NO_MISS and 39 in HSTunnel, respectively, and achieves similar performance when 𝑁2𝐶 value is near 𝑁1𝐶. The S2DP algorithm can reach 90% and 80% accuracy when 𝑁2𝐶 is greater than 6 and 30 in HSTunnel, respectively. Note that in HSTunnel the performance of S2DP drops when 𝑁2𝐶 is greater than 45, because more order-changed vehicles are included. As described in Section 3.5, the S2DP cannot correctly identify order-changed vehicles.
48
Figure 4-5. Experimental result of vehicle groups matching algorithms on HSTunnel.
The x-axis is the number of vehicles in C2 assigned, and y-axis is the accuracy.
49
Figure 4-6. Experimental result of vehicle groups matching algorithms on HSTunnel_NO_MISS. The x-axis is the number of vehicles in C2 assigned, and y-axis
is the accuracy.
50
Figure 4-7. Average accuracy of multi-camera vehicle groups matching algorithms.
4.4 Real-Time and Offline Vehicle Identification
We evaluate the proposed real-time and offline identification algorithms in this section. In the first step we collect a set of vehicles in the second camera and run the S2DP algorithm to solve the initialization problem. Next we can apply the proposed real-time assignment or offline refinement algorithms.
4.4.1 Real-Time Identification
Similar to the experiments in Section 4.3, we randomly select one starting point and apply the real-time RT algorithm, as described in Section 3.5.1. The final result is the average accuracy of the 15 rounds of execution with random starting points.
Table 4-3 summarizes the experimental settings and Figure 4.8 illustrates an example of experiments on HSTunnel. In Figure 4.8, the solid lines represent the number of vehicles used in the S2DP, and the dotted lines for the RT. Assume that
51
camera C1 starts at vehicle index i and C2 at j, where each vehicle is given an index numbered from 0 to 194 in HSTunnel (see Table 4-1). The S2DP algorithm identifies 𝑗th to (𝑗 + 29)th vehicles in C2, and the RT identifies (𝑗 + 27)th to (𝑗 + 59)th vehicles. The (𝑗 + 27)th to (𝑗 + 29)th are re-identified in RT. Finally, in total 60 vehicles are identified.
Table 4-3. Experimental settings of the real-time methods.
Dataset S2DP RT Total
Figure 4-8. Example of real-time experiments on HSTunnel. The solid lines represent number of vehicles in the S2DP and dotted lines for the RT. Assume camera C1 starts
at vehicle index i and C2 at j.
We use different settings on HSTunnel and HSTunnel_NO_MISS in the experiments, since the performances of S2DP are different in the two dataset in
52 vehicles and the RT starts on the 28th one. For candidates in the first camera, the RT starts with the one that is assigned to the 28th vehicle of the second camera, and the following 35 candidates are searched in the RT. Finally, 60 vehicles are assigned in both S2DP and RT, and all of the 60 vehicles are taken into consideration when computing the accuracy. For HSTunnel_NO_MISS, the S2DP assigns 20 vehicles from 50 candidates, and the RT assigns 18th to 45th vehicles from 30 candidates. The accuracy considers for all 45 vehicles. For HSTunnel_NO_MISS, the S2DP assigns 20 vehicles from 50 candidates, and the RT assigns 18th to 45th vehicles from 30 candidates.
To demonstrate the effect on different properties of RT algorithm, some parts are removed from RT in the experiments. Three properties in RT are: NS2 penalty 𝜆, re-assignment of the last three results from S2DP, and multiple assignments on one candidate. Therefore, we introduce three variants of RT algorithm. The first one is RT-w1 which sets NS2 penalty 𝜆 to zero so the effect on this penalty is discarded.
Next the RT-w2 further discards the multiple-assignments property from RT-w2 and one candidate can be assigned only once. Finally, the RT-w3 discards all the three properties.
Table 4-4 and Table 4-5 show the performances of different methods on HSTunnel and HSTunnel_NO_MISS, respectively. Each value in the table is average
53
accuracy of vehicle identification of two cameras (column) using different method (row). Note that the Hungarian algorithm, which is the baseline method, is an offline method. In both HSTunnel and HSTunnel _NO_MISS, all RT methods outperform Hungarian algorithm. The performance of real-time RT outperforms RT-w1, -w2, and -w3 methods. Note that camera settings (C3, C5) and (C4, C5) perform poor in HSTunnel. The reason is the same as mentioned in Section 4.3, where a number of miss detections exist in candidates of C3 and C4. Especially the RT in (C3, C5), most desired candidates are not in the search window, only 16% correctness can be obtained.
Table 4-4. Average accuracy of real-time methods on HSTunnel.
Method HSTunnel
54
Table 4-5. Average accuracy of real-time methods on HSTunnel_NO_MISS.
Method HSTunnel_NO_MISS
Our offline algorithm OR is compared with Hungarian algorithm, S2DP, and the state-of-the-art algorithm, Hungarian Voting (HV) [9], for multi-camera vehicle identification in tunnels.
The Hungarian Voting algorithm [9] for multi-camera identification runs as follows. In the beginning, it pushes 𝑁𝐻 detected vehicles into respective queues for two cameras, and obtains the 𝑁𝐻× 𝑁𝐻 distance matrix. Next the Hungarian
55
In addition, the Hungarian Voting algorithm cannot solve the initialization problem described in Section 3.5. HV performs poorly in HSTunnel which requires solving the problem first. Therefore, we apply the S2DP algorithm in the first step, followed by HV for offline assignments. This method is called S2DP+HV and is used in HSTunnel dataset. S2DP+HV is not included in experiments on HSTunnel_NO_MISS since this dataset does not require solving the initialization problem, and the performance is the same as HV.
Similar to the experiments in Section 4.3, we leave 𝑁1𝐶 candidates in the first camera and assigned by 𝑁2𝐶 vehicles in the second camera. Here 𝑁1𝐶 is set as 65, and 𝑁2𝐶 is set as 60. Table 4-6 shows the experimental results on HSTunnel and HSTunnel_NO_MISS, respectively. The tested methods are S2DP, Hungarian algorithm, OR, Hungarian-voting from related work, and S2DP+HV.
As shown in Table 4-6, OR outperforms S2DP and Hungarian Voting. As stated before, the performance of Hungarian-voting drops dramatically in HSTunnel, and it is even worse than baseline Hungarian algorithm. The Hungarian Voting does not include the initialization problem, so that proper candidates would not present in the search window in the whole processing. The result of the refined S2DP+HV method is shown in the last column of Table 4-6, yet our proposed method still outperforms S2DP+HV.
56
Table 4-6. Comparison of average accuracy using different offline methods.
Dataset Hungarian
The parameter settings of our proposed algorithms are discussed in this section.
In addition, two more datasets are included to further verify our methods.
4.5.1 Miss-Match Penalty
The miss-match penalty 𝜖 plays an important role in our algorithm. Section 3.4.2 describes the semantic meaning of 𝜖 that the value is around half of the matching (assignment) distance threshold. However, it is still difficult to determine the maximum distance in applications.
Take a look at the example of the distance matrix in Figure 4-9. Ideally the algorithm will assign vehicles at (1, 3), (2, 4) and (3, 5) since the three distance values are smaller than others. In fact, most values in the distance matrix are greater than those matched points. This gives us inspiration to use the average value of the distance matrix as matching threshold, which equals to 2𝜖. The detail of miss-match penalty is discussed in Section 3.4.1.