Testing dataset configurations - Objective Evaluation

A Novel Approach for Improving the Quality of Service for Wireless Video Transcoding

RUMV PSME PMVR FSME Execution

8.1 Objective Evaluation

8.1.1 Testing dataset configurations

The frame size is 320X240 which implies that we have 20X15 macroblocks in each P frame. Our testing dataset presents objects in different positions, speed, and object size.

Objects in scattered form, such as the full court view of the football game and centralization form such as anchor person in the news. The temporal variation is different from high, such as players in bicycle racing, through medium to low, such as the pitching view of baseball game to the anchor person in the news. In addition, typical videoconference sequences (Mother Daughter, Akiyo, Claire) and sequences containing objects with straight forward motion (Hall Monitor, Container Ship) have been tested.

The metric used in the experiments are precision and recall, which are used together to measure the accuracy of the object detection system. We choose the recall and precision metrics as defined in Eq(8.1) and Eq(8.2) as they are commonly used in

performance evolution [36,28,41,60].

In each frame, the number of hits is the number of macroblocks that contain an object and this object is correctly detected. The number of false alarms is the number of macroblocks, which contain no object yet are falsely identified as containing objects. The number of misses is the number of macroblocks that contain an object but yet the detection algorithm fails to detect it. We use the macroblock as the unit of measurement because we are doing the object detection in the compressed domain. The aim of the

experiment is to evaluate the objective performance of the proposed system.

Table 8-1: video sequence clip database

Daily TV news program Spanish TV, RTVE Weekly TV news program Spanish TV, RTVE Drama/ Movie "Art" movie: Hallo Christoph Rodatz, GMD

Movie: "La sombra de un cipres es alargada"

Spanish TV, RTVE TV Drama series: "Pepa y Pepe" Spanish TV, RTVE

Sitcom (1 and 2) Portuguese TV, RTP &

SIC Documentary "Science Eye": Bridge construction NHK

5 clips of scientific documentaries SFRS

Documentary about buildings Lancaster Television Basic Ophthalmic Exam Univ. of Tennessee Documentary about a village:

"Santillana del Mar" Spanish TV, RTVE Sport 3 Sport Clips: Soccer, Cycling,

Basketball

Spanish TV, RTVE

2 Sport clips: Basketball, Golf Korean Broadcasting Station

Soccer sequence Samsung

Commercial 14 items of commercials in Korean Samsung Music video and

games

Korea's pop singers' live music Show

Korean Broadcasting Station

TV quiz program: "Saber y ganar" Spanish TV, RTVE Music program: "Musica si" Spanish TV, RTVE Variety Show. First 30 minutes of

complete program

Figures 8-1 through 8-4 show the results of object extraction over the anchor person video clip from the MPEG testing dataset. We show the precision metric and recall metric of our object extraction scheme for this video clip both with and without the filter being used, and we construct manually the ground truth of the video clip.

Fig.8-1 and 8-2 illustrate the value of the recall and precision metrics for each frame in the video clip. We note that the performance of object extraction using the Gaussian filter is consistently superior to that using other filters or no filter. We show the average recall metric and average precision metric for the whole clip in Fig. 8-3 and Fig. 8-4.

Again, the Gaussian filter topped them all for both precision and recall. The high recall metric in the mean filter with low precision is due to the fact that the mean filter creates unrealistic motion vectors, generating a high recall value, but with many false alarms.

Besides, we can infer from Figure 8-1 through 8-4 that the median filter due to its nature does not adjust motion vector values. Rather it is just rearranging motion vectors, not adjusting the content of those values and not eliminating the noise within them.

Hence, the precision is almost like that not using a filter and only the recall metric increases slightly. In summary, the Gaussian filter boosts the object detection performance. In addition, the computational complexity is low as discussed previously.

The Gaussian filter is available as a readily implemented component in both hardware and software, which demonstrates the flexibility and extendibility of the proposed scheme. Testing is performed using four types of related work which are, Group A using Gaussian filter only [37], group B using Median filter [28], Group C using Cascade Filter , and group D our system, finally without any kind of post processing.

Fig. 8-1: Precision for Object extraction in P-frames of Anchor person

Fig. 8-2: Recall for Object extraction in P-frames of Anchor person

Fig. 8-3 Average Precision for object Fig. 8-4 Average Recall for object detection for anchor person Video detection for anchor person Video

Figures 8-5 through 8-8 show the performance results of object extraction over the second video clip among the MPEG7 walking person testing dataset. We show the precision metric and recall metric of our object extraction scheme for this video clip both with and without the filter being used, and we construct manually the ground truth of the

video clip. Fig. 8-5 and 8-6 illustrate the values of the recall and precision metrics for each frame in the video clip.

We note that the performance of our system is consistently superior to other schemes. We show the average recall metric and average precision metric for the whole clip in Fig. 8-7 and 8-8. Again, our system topped them all. Through the experiment, we noticed that there is a weakness in the single Gaussian filter when the object location is in the frame border. This is due to the lack of information in the neighborhood near the border.

In summary, the proposed system boosts the performance, while keeping the computational complexity low. Both the Gaussian and Median filters are available as a readily implemented component in both hardware and software. In addition, the motion vectors, DCT coefficient and AC component are readily available in MPEG stream. As we refine the motion vectors resulting in vectors that are stable, execution time of the object extraction algorithm after using the filter will be reduced significantly compared to that without using any kind of post processing.

Fig. 8-5: Precision for Object extraction in P frame2^nd sequence In walking person testing dataset

Fig. 8-6: Recall for Object extraction in P frames 2^nd sequence In walking person testing dataset

Fig. 8-7 Average precision of 2^ndsequence Fig. 8-8 Average Recall for 2^nd sequence In walking person testing dataset In walking person testing dataset

We then computed the average numbers of missing MBs, false MBs. Numbers of missing and false MBs are simply computed by comparing a segmented object mask with its related ground truth. Fig. 8-9 presents results of object detection approach on various video test sequences.

(a) (b)

Anchor Person

2 Sport clips: Basketball, Golf

items of commercials in Korean

Fig. 8-9 Recall and precision for the following video clip (a) Interview (b) TV drama (c) Anchor person (d) Spanish news (e) Miss America (f) Spanish news (g) Hall monitor (h) Locally captured video (i) Documentary about buildings (j) Bicycle racing (k) Sport scenes for golf courts (l) commercial in Korean (m) three out door scenes (n)Akiyo (o) football game

8.1.2 Discussion

The recall rate drops in the speedways sequence because vehicles are very small when they are far away. “Hall Monitor,” is surveillance type of video containing small moving objects and complex background. “Miss America” and “Akiyo” sequences are typical head-and shoulder type video in QCIF and CIF format, respectively. For

sequences which have temporarily still objects from the beginning, such as “Miss America” or “Akiyo,” our proposed approach still have sufficiently good detection results.

在文檔中移動向量精煉新方法運用於視訊處理和視訊轉換編碼的研究 (頁 105-113)