The Object Detection Algorithm - 在壓縮格式下一個以紋路與空間資訊來加強位移向量之創新架構

Input: P-frames of a video clip

Output: object sets {Obj1 , Obj2 , … ObjN } where N is total number of regions in P-frame and ObjN means the N^th object of the P-frame. Each object size is measured in terms of number of macroblocks.

1. Cluster motion vectors that are of similar magnitude and direction into the same group with region growing approach.

1.1 Set search windows (W) size 3×3 macroblocks.

1.2. Search all macroblocks (MB) within W, and compute the difference (_diffMag_kanddiffAng _k) of motion vector (MV) magnitude ( MV ) and direction

( ) between center and its neighboring eight motion vectors within W.

, where is the predefined threshold for MV magnitude and is the threshold for MV direction

Fk Fcenter MV_center F_center

Otherwise, set all flags within W to 0.

1.3 Go to step 1.2 until all macroblocks are processed.

1.4 Group macroblocks that are marked as 1 into the same cluster.

1.5 Compute each object center and record its associated macroblocks.

Figure 14: Object Detection using Gaussian filter

Figure 15: Object detection without filter.

Figure 16: Object detection using Median filter

Figure 17: Object detection using Mean filter

Chapter 7 Experimental Results and Discussion

We have designed an experiment in order to verify optimal performance, and to test the noise model assumptions. The experiment has been designed to test the proposed scheme on three video clips. These video clips are in MPEG format and are part of the MPEG7 testing dataset. Testing is performed using four types of other’s related work, Group A using texture filter only [35], group B using spatial filter only mainly Gaussian filter [33], Group C using texture and spatial filer as equally important[25], and group D our system. Finally without any kind of post processing is a base for comparison. In order to compare the performance among those four system results. The frame size is 320×240 which implies that we have 20×15 macroblocks in each P frame. Our testing dataset presents walking persons in different positions, speed, and can vary slightly in object size.

The following settings were used. For the Gaussian filter, the σ value was set to 1.2, the kernel size to 3×3, and iteration to 5. For the median filter, the mean filter kernel size was set to 3×3 since its computational burden is less and in accordance with the researcher’s recommendation, where they considered a 3×3 kernel size to be convenient and sometimes optimum. Moreover, it fits our object detection search window whose size is also 3×3. Besides, the kernel content was set to be 1 for all of the cells, which implies enabling all the elements in the kernel to take part in the neighboring box at the convolution step. Finally the iteration number is set to the optimal value 5.

We choose the recall and precision metrics because they are most commonly used to evaluate object detection system performance [15,17,30,31]. In Eq. (3) Eq. (4), we can see the definition of recall and precision. In each frame, the number of hits is the number of macroblocks that contained an object and this object is correctly detected. The number of false alarms is the number of macroblocks that contained no object yet are falsely identified as containing objects. The number of misses is the number of macroblocks that contain an object but yet the detection algorithm failed to detect it. We use the macroblock as the unit of measurement because we are doing the object detection in the compressed domain.

Figure 15-18` show the results of object detection performance over the second video clip among the MPEG testing dataset. We show the precision metric and recall metric of our object detection scheme for this video clip both with and without the filter being used, and we construct manually the ground truth of the video clip. Figure 18 and 19 illustrate the values of the recall and precision metrics for each frame in the video clip.

We note that the performance of our system is consistently superior to performance using others schemes. We show the average recall metric and average precision metric for the

whole clip in Figure 20 and 21. Again, our system topped them all. Through our experiment we noticed that there is a weakness in the single Gaussian filter performance when the object location is in the frame border. This can be explained due to the lack of information in the neighborhood near the border. We can infer in the single Gaussian filter case, when the person in the video clips is just coming in or just going out, the performance will not be so good as before.

In summary, the proposed system boosts the performance. In addition, the computational complexity is low. Both the Gaussian and median filters are available as a readily implemented component in both hardware and software. Besides, the DCT coefficient and AC component are readily available in MPEG stream. Because we refine the motion vectors resulting in vectors that are easy to process, execution time of the object detection algorithm after using the filter will be reduced significantly compared to that without using a filter. Although we add another block for filtering, the efficiency is almost the same or even better in terms of execution time for the entire object detection process.

Group C

Figure 18: Precision for Object detection in P frames

Group A Group C Without Group B Group D

Figure 19: Recall for Object detection in P frames

Figure 20: Average precision of object detection for 2^nd Video Clip

Recall

0.8 Without

Group C 0.6

Group A

0.4 Group B

0.2 0

Group D

Figure 21: Average Recall of object detection for 2^nd Video

Chapter 8

在文檔中在壓縮格式下一個以紋路與空間資訊來加強位移向量之創新架構 (頁 38-47)