Feature Selection - 隧道監控系統之多攝影機車輛辨識

Chapter 4. Experiments

4.2 Feature Selection

To choose a proper feature descriptor for multi-camera vehicle identification, HSTunnel_NO_MISS dataset is selected to evaluate the distinctiveness of each feature described in Section 3.3. All vehicles in the dataset exist in all cameras. That is, no miss detection occurs in this dataset. First, we choose two out of five cameras in HSTunnel_NO_MISS dataset and calculate the 124*124 visual distance matrix. Next, for each row of the distance matrix, we can obtain the rank of the vehicle (row). Rank

𝑖 means the corresponding vehicle in the next camera (column) is the 𝑖th smallest distance value in the row. For example, rank 1 means the corresponding vehicle in the next camera has the smallest distance value with respect to all others in the row, which is the best result. Rank 2 means the correct vehicle has the second smallest distance value, and so on. Therefore, the HSTunnel_NO_MISS contains five cameras and each feature can obtain ten execution results by exhaustively choosing two

For the Haar-like feature [11] used in feature extraction, we manually collect 900 grayscale images of vehicles from eight tunnel videos as positive set, which is different from dataset HSTunnel, and 1800 negative samples randomly sampled from 20 background images inside tunnels without vehicles. The resolution of training images is 24*24, where the resolution of our video is 352*240 pixels. The number of cascading stages is 10 and the total number of weak classifiers obtained from the training algorithm is 143. The trained detector is only used in the experiments of feature selection.

Figure 4-3. Average performance on different feature descriptors.

Table 4-2. Average performance on CMC values of different feature descriptors.

Feature Descriptor CMC(1) CMC(3) CMC(5) CMC(10)

Image Intensity 32 44 50 60 rank values, and Table 4-2 shows the CMC values with some rank value. The selected features are image intensity, Haar-like feature vector, color histograms, and keypoints descriptors. For color histograms, RGB, hue, and opponent color space are used in the

experiments. And for keypoints descriptors, SIFT, ORB, and OpponentSIFT are tested.

We can clearly observe that OpponentSIFT outperforms other feature descriptors in Figure 4-3. The result is similar to the recommendation in [24]. In conclusion, we choose OpponentSIFT as our feature descriptor.

For tunnel surveillance, OpponentSIFT outperforms SIFT since OpponentSIFT considers color information but SIFT does not. As described in the previous section, the color and structure information of a vehicle are powerful visual information that should be considered.

All color histogram descriptors perform poor in the experiments. Color is a strong feature in vehicles. However, many vehicles are mostly in the same color, especially in our experiments that we look at 124 vehicles at the same time. Therefore, using only color information is not sufficient for multi-camera identification.

Finally, the Haar-feature vector used in [9] cannot achieve good result in CMC value of the experiment. It is used in vehicle classifier, which means all vehicles share the similar characteristics. The Haar-feature vector we trained contains only 148 dimensions with binary values, and it is not sufficient to describe all vehicles since most of the values are the same.

4.3 Multi-Camera Vehicle Groups Matching

This section presents the experiments on the multi-camera vehicle groups matching algorithms. By using the OpponentSIFT as feature descriptor, the assignment algorithms can be applied to match vehicles from two cameras.

Assume that we have 𝑁₁^𝐶 vehicles in the first camera and 𝑁₂^𝐶 vehicles in the second camera and 𝑁₂^𝐶 ≤ 𝑁₁^𝐶. In the following experiments, we set 𝑁₁^𝐶 as 50 and 𝑁₂^𝐶 from 1 to 50. Considering the order constraint in tunnels, the larger 𝑁₂^𝐶 is, the higher accuracy can be achieved. For each vehicle in the second camera, a vehicle in the first camera is assigned to it or is assigned as “no-match” if all candidates are not suitable. The accuracy is the percentage of correct assignments over all 𝑁₂^𝐶 vehicles in the second camera.

Figure 4-4 shows an example of assignment result. The first two rows are the detected ID of vehicles in capital letters, and the third row is the output from some algorithm. For every vehicle in C2, the algorithm chooses one best-match vehicle from C₁. The first four results are correct. The 3^rd result is correct because vehicle C is not in camera C1 and the algorithm assigns a no-match to it. Finally, the last two results are examples of incorrect assignment: vehicle F in C₂ matches vehicle E, and vehicle G in C2 is claimed as no-match in C1 but vehicle G does exist. Therefore, the accuracy of this example is 4/6 = 67% . The experiments on these two cameras are represented as (C1, C2), where C1 is the first camera and C2 is the second camera.

Figure 4-4. Example of an assignment result on (C₁, C₂). The first row is the candidate queue contains detections in camera C₁, the second row in the second

camera C₂, and the third row is an example of execution result.

The experiments proceed as follows. First we randomly set a number as the starting index in two cameras, the following 𝑁₁^𝐶 vehicles in the first camera and 𝑁₂^𝐶 vehicles in the second camera are used in the experiment. It is necessary because the order of vehicles in our dataset cannot change, and the number of vehicles is greater than 𝑁₁^𝐶. We randomly select 15 starting index and run 15 times independently, and take the average accuracy as the final result. After that we increase the value of 𝑁₂^𝐶 by one, and randomly execute 15 times again. Finally 𝑁₂^𝐶 is tested from 5 to 50, and 𝑁₂^𝐶 is incremented by one after each iteration.

Three methods are evaluated in the following experiments: S²DP, S²DP without NS² penalty, and Hungarian algorithm. To evaluate the effectiveness of NS² penalty, we include the result that NS² penalty 𝜆 is set to zero. As a common solution of assignment problems, the Hungarian algorithm is selected as the baseline for performance comparison. The miss-match penalty 𝜖 in S²DP is set to 450 in all experiments, where discussions on the value of 𝜖 are presented in Section 4.5.1.

Both HSTunnel and HSTunnel_NO_MISS are tested in the experiments.

Figure 4-5 and Figure 4-6 show the results on every camera setting in HSTunnel and HSTunnel_NO_MISS, respectively. The x-axis is the value of 𝑁₂^𝐶 and the y-axis is the corresponding accuracy. In HSTunnel, all methods perform poor in camera (C3, C5) and (C4, C5) when 𝑁₂^𝐶 is small. Table 1 shows that camera C5 contains 173 vehicles, whereas camera C3 contains only 148 vehicles. In other words, in total 25 vehicles in the second camera (C₅) cannot find corresponding candidates in the first camera (C3) because of miss detections. Therefore, the performance on (C3, C5) may decrease if the number of vehicles in second camera is not enough, same as in (C4, C5).

Camera (C2, C3) does not have such problem even there are 47 vehicles miss in C3

since miss-detected vehicles in C₃ are not candidates. The algorithms do not execute on miss-detected vehicles in the second camera, as illustrates in Figure 4-4, thus the performance on (C₂, C₃) does not have significant differences compared with (C₃, C₅).

Figure 4-6 on HSTunnel_NO_MISS does not have this effect since there is no miss detection in HSTunnel_NO_MISS.

Figure 4-7 shows the results on HSTunnel and HSTunnel_NO_MISS. The x-axis is the value of 𝑁₂^𝐶 and the y-axis is the accuracy. All methods achieve higher accuracy in HSTunnel_NO_MISS than HSTunnel since there is no miss detection in HSTunnel_NO_MISS. The Hungarian algorithm does not work well because the visual feature is not robust enough to provide sufficient information. With NS² penalty, the S²DP algorithm can achieve higher accuracy when 𝑁₂^𝐶 is below 45 in HSTunnel_NO_MISS and 39 in HSTunnel, respectively, and achieves similar performance when 𝑁₂^𝐶 value is near 𝑁₁^𝐶. The S²DP algorithm can reach 90% and 80% accuracy when 𝑁₂^𝐶 is greater than 6 and 30 in HSTunnel, respectively. Note that in HSTunnel the performance of S²DP drops when 𝑁₂^𝐶 is greater than 45, because more order-changed vehicles are included. As described in Section 3.5, the S²DP cannot correctly identify order-changed vehicles.

Figure 4-5. Experimental result of vehicle groups matching algorithms on HSTunnel.

The x-axis is the number of vehicles in C2 assigned, and y-axis is the accuracy.

Figure 4-6. Experimental result of vehicle groups matching algorithms on HSTunnel_NO_MISS. The x-axis is the number of vehicles in C2 assigned, and y-axis

is the accuracy.

Figure 4-7. Average accuracy of multi-camera vehicle groups matching algorithms.

4.4 Real-Time and Offline Vehicle Identification

We evaluate the proposed real-time and offline identification algorithms in this section. In the first step we collect a set of vehicles in the second camera and run the S²DP algorithm to solve the initialization problem. Next we can apply the proposed real-time assignment or offline refinement algorithms.

4.4.1 Real-Time Identification

Similar to the experiments in Section 4.3, we randomly select one starting point and apply the real-time RT algorithm, as described in Section 3.5.1. The final result is the average accuracy of the 15 rounds of execution with random starting points.

Table 4-3 summarizes the experimental settings and Figure 4.8 illustrates an example of experiments on HSTunnel. In Figure 4.8, the solid lines represent the number of vehicles used in the S²DP, and the dotted lines for the RT. Assume that

camera C₁ starts at vehicle index i and C₂ at j, where each vehicle is given an index numbered from 0 to 194 in HSTunnel (see Table 4-1). The S²DP algorithm identifies 𝑗^th to (𝑗 + 29)^th vehicles in C₂, and the RT identifies (𝑗 + 27)^th to (𝑗 + 59)^th vehicles. The (𝑗 + 27)^th to (𝑗 + 29)^th are re-identified in RT. Finally, in total 60 vehicles are identified.

Table 4-3. Experimental settings of the real-time methods.

Dataset S²DP RT Total

Figure 4-8. Example of real-time experiments on HSTunnel. The solid lines represent number of vehicles in the S²DP and dotted lines for the RT. Assume camera C1 starts

at vehicle index i and C2 at j.

We use different settings on HSTunnel and HSTunnel_NO_MISS in the experiments, since the performances of S²DP are different in the two dataset in

52 vehicles and the RT starts on the 28^th one. For candidates in the first camera, the RT starts with the one that is assigned to the 28^th vehicle of the second camera, and the following 35 candidates are searched in the RT. Finally, 60 vehicles are assigned in both S²DP and RT, and all of the 60 vehicles are taken into consideration when computing the accuracy. For HSTunnel_NO_MISS, the S²DP assigns 20 vehicles from 50 candidates, and the RT assigns 18^th to 45^th vehicles from 30 candidates. The accuracy considers for all 45 vehicles. For HSTunnel_NO_MISS, the S²DP assigns 20 vehicles from 50 candidates, and the RT assigns 18^th to 45^th vehicles from 30 candidates.

To demonstrate the effect on different properties of RT algorithm, some parts are removed from RT in the experiments. Three properties in RT are: NS² penalty 𝜆, re-assignment of the last three results from S²DP, and multiple assignments on one candidate. Therefore, we introduce three variants of RT algorithm. The first one is RT-w1 which sets NS² penalty 𝜆 to zero so the effect on this penalty is discarded.

Next the RT-w2 further discards the multiple-assignments property from RT-w2 and one candidate can be assigned only once. Finally, the RT-w3 discards all the three properties.

Table 4-4 and Table 4-5 show the performances of different methods on HSTunnel and HSTunnel_NO_MISS, respectively. Each value in the table is average

accuracy of vehicle identification of two cameras (column) using different method (row). Note that the Hungarian algorithm, which is the baseline method, is an offline method. In both HSTunnel and HSTunnel _NO_MISS, all RT methods outperform Hungarian algorithm. The performance of real-time RT outperforms RT-w1, -w2, and -w3 methods. Note that camera settings (C₃, C₅) and (C₄, C₅) perform poor in HSTunnel. The reason is the same as mentioned in Section 4.3, where a number of miss detections exist in candidates of C₃ and C₄. Especially the RT in (C₃, C₅), most desired candidates are not in the search window, only 16% correctness can be obtained.

Table 4-4. Average accuracy of real-time methods on HSTunnel.

Method HSTunnel

Table 4-5. Average accuracy of real-time methods on HSTunnel_NO_MISS.

Method HSTunnel_NO_MISS

Our offline algorithm OR is compared with Hungarian algorithm, S²DP, and the state-of-the-art algorithm, Hungarian Voting (HV) [9], for multi-camera vehicle identification in tunnels.

The Hungarian Voting algorithm [9] for multi-camera identification runs as follows. In the beginning, it pushes 𝑁^𝐻 detected vehicles into respective queues for two cameras, and obtains the 𝑁^𝐻× 𝑁^𝐻 distance matrix. Next the Hungarian

In addition, the Hungarian Voting algorithm cannot solve the initialization problem described in Section 3.5. HV performs poorly in HSTunnel which requires solving the problem first. Therefore, we apply the S²DP algorithm in the first step, followed by HV for offline assignments. This method is called S²DP+HV and is used in HSTunnel dataset. S²DP+HV is not included in experiments on HSTunnel_NO_MISS since this dataset does not require solving the initialization problem, and the performance is the same as HV.

Similar to the experiments in Section 4.3, we leave 𝑁₁^𝐶 candidates in the first camera and assigned by 𝑁₂^𝐶 vehicles in the second camera. Here 𝑁₁^𝐶 is set as 65, and 𝑁₂^𝐶 is set as 60. Table 4-6 shows the experimental results on HSTunnel and HSTunnel_NO_MISS, respectively. The tested methods are S²DP, Hungarian algorithm, OR, Hungarian-voting from related work, and S²DP+HV.

As shown in Table 4-6, OR outperforms S²DP and Hungarian Voting. As stated before, the performance of Hungarian-voting drops dramatically in HSTunnel, and it is even worse than baseline Hungarian algorithm. The Hungarian Voting does not include the initialization problem, so that proper candidates would not present in the search window in the whole processing. The result of the refined S²DP+HV method is shown in the last column of Table 4-6, yet our proposed method still outperforms S²DP+HV.

Table 4-6. Comparison of average accuracy using different offline methods.

Dataset Hungarian

The parameter settings of our proposed algorithms are discussed in this section.

In addition, two more datasets are included to further verify our methods.

4.5.1 Miss-Match Penalty

The miss-match penalty 𝜖 plays an important role in our algorithm. Section 3.4.2 describes the semantic meaning of 𝜖 that the value is around half of the matching (assignment) distance threshold. However, it is still difficult to determine the maximum distance in applications.

Take a look at the example of the distance matrix in Figure 4-9. Ideally the algorithm will assign vehicles at (1, 3), (2, 4) and (3, 5) since the three distance values are smaller than others. In fact, most values in the distance matrix are greater than those matched points. This gives us inspiration to use the average value of the distance matrix as matching threshold, which equals to 2𝜖. The detail of miss-match penalty is discussed in Section 3.4.1.

在文檔中隧道監控系統之多攝影機車輛辨識 (頁 52-67)