A Novel Approach for Improving the Quality of Service for Wireless Video Transcoding
RUMV PSME PMVR FSME Execution
8.1 Objective Evaluation
8.1.1 Testing dataset configurations
The frame size is 320X240 which implies that we have 20X15 macroblocks in each P frame. Our testing dataset presents objects in different positions, speed, and object size.
Objects in scattered form, such as the full court view of the football game and centralization form such as anchor person in the news. The temporal variation is different from high, such as players in bicycle racing, through medium to low, such as the pitching view of baseball game to the anchor person in the news. In addition, typical videoconference sequences (Mother Daughter, Akiyo, Claire) and sequences containing objects with straight forward motion (Hall Monitor, Container Ship) have been tested.
The metric used in the experiments are precision and recall, which are used together to measure the accuracy of the object detection system. We choose the recall and precision metrics as defined in Eq(8.1) and Eq(8.2) as they are commonly used in
performance evolution [36,28,41,60].
In each frame, the number of hits is the number of macroblocks that contain an object and this object is correctly detected. The number of false alarms is the number of macroblocks, which contain no object yet are falsely identified as containing objects. The number of misses is the number of macroblocks that contain an object but yet the detection algorithm fails to detect it. We use the macroblock as the unit of measurement because we are doing the object detection in the compressed domain. The aim of the
experiment is to evaluate the objective performance of the proposed system.
Table 8-1: video sequence clip database
Daily TV news program Spanish TV, RTVE Weekly TV news program Spanish TV, RTVE Drama/ Movie "Art" movie: Hallo Christoph Rodatz, GMD
Movie: "La sombra de un cipres es alargada"
Spanish TV, RTVE TV Drama series: "Pepa y Pepe" Spanish TV, RTVE
Sitcom (1 and 2) Portuguese TV, RTP &
SIC Documentary "Science Eye": Bridge construction NHK
5 clips of scientific documentaries SFRS
Documentary about buildings Lancaster Television Basic Ophthalmic Exam Univ. of Tennessee Documentary about a village:
"Santillana del Mar" Spanish TV, RTVE Sport 3 Sport Clips: Soccer, Cycling,
Basketball
Spanish TV, RTVE
2 Sport clips: Basketball, Golf Korean Broadcasting Station
Soccer sequence Samsung
Commercial 14 items of commercials in Korean Samsung Music video and
games
Korea's pop singers' live music Show
Korean Broadcasting Station
TV quiz program: "Saber y ganar" Spanish TV, RTVE Music program: "Musica si" Spanish TV, RTVE Variety Show. First 30 minutes of
complete program
Figures 8-1 through 8-4 show the results of object extraction over the anchor person video clip from the MPEG testing dataset. We show the precision metric and recall metric of our object extraction scheme for this video clip both with and without the filter being used, and we construct manually the ground truth of the video clip.
Fig.8-1 and 8-2 illustrate the value of the recall and precision metrics for each frame in the video clip. We note that the performance of object extraction using the Gaussian filter is consistently superior to that using other filters or no filter. We show the average recall metric and average precision metric for the whole clip in Fig. 8-3 and Fig. 8-4.
Again, the Gaussian filter topped them all for both precision and recall. The high recall metric in the mean filter with low precision is due to the fact that the mean filter creates unrealistic motion vectors, generating a high recall value, but with many false alarms.
Besides, we can infer from Figure 8-1 through 8-4 that the median filter due to its nature does not adjust motion vector values. Rather it is just rearranging motion vectors, not adjusting the content of those values and not eliminating the noise within them.
Hence, the precision is almost like that not using a filter and only the recall metric increases slightly. In summary, the Gaussian filter boosts the object detection performance. In addition, the computational complexity is low as discussed previously.
The Gaussian filter is available as a readily implemented component in both hardware and software, which demonstrates the flexibility and extendibility of the proposed scheme. Testing is performed using four types of related work which are, Group A using Gaussian filter only [37], group B using Median filter [28], Group C using Cascade Filter , and group D our system, finally without any kind of post processing.
Fig. 8-1: Precision for Object extraction in P-frames of Anchor person
Fig. 8-2: Recall for Object extraction in P-frames of Anchor person
Fig. 8-3 Average Precision for object Fig. 8-4 Average Recall for object detection for anchor person Video detection for anchor person Video
Figures 8-5 through 8-8 show the performance results of object extraction over the second video clip among the MPEG7 walking person testing dataset. We show the precision metric and recall metric of our object extraction scheme for this video clip both with and without the filter being used, and we construct manually the ground truth of the
video clip. Fig. 8-5 and 8-6 illustrate the values of the recall and precision metrics for each frame in the video clip.
We note that the performance of our system is consistently superior to other schemes. We show the average recall metric and average precision metric for the whole clip in Fig. 8-7 and 8-8. Again, our system topped them all. Through the experiment, we noticed that there is a weakness in the single Gaussian filter when the object location is in the frame border. This is due to the lack of information in the neighborhood near the border.
In summary, the proposed system boosts the performance, while keeping the computational complexity low. Both the Gaussian and Median filters are available as a readily implemented component in both hardware and software. In addition, the motion vectors, DCT coefficient and AC component are readily available in MPEG stream. As we refine the motion vectors resulting in vectors that are stable, execution time of the object extraction algorithm after using the filter will be reduced significantly compared to that without using any kind of post processing.
Fig. 8-5: Precision for Object extraction in P frame2nd sequence In walking person testing dataset
Fig. 8-6: Recall for Object extraction in P frames 2nd sequence In walking person testing dataset
Fig. 8-7 Average precision of 2ndsequence Fig. 8-8 Average Recall for 2nd sequence In walking person testing dataset In walking person testing dataset
We then computed the average numbers of missing MBs, false MBs. Numbers of missing and false MBs are simply computed by comparing a segmented object mask with its related ground truth. Fig. 8-9 presents results of object detection approach on various video test sequences.
(a) (b)
Anchor Person
2 Sport clips: Basketball, Golf
items of commercials in Korean
0
Fig. 8-9 Recall and precision for the following video clip (a) Interview (b) TV drama (c) Anchor person (d) Spanish news (e) Miss America (f) Spanish news (g) Hall monitor (h) Locally captured video (i) Documentary about buildings (j) Bicycle racing (k) Sport scenes for golf courts (l) commercial in Korean (m) three out door scenes (n)Akiyo (o) football game
8.1.2 Discussion
The recall rate drops in the speedways sequence because vehicles are very small when they are far away. “Hall Monitor,” is surveillance type of video containing small moving objects and complex background. “Miss America” and “Akiyo” sequences are typical head-and shoulder type video in QCIF and CIF format, respectively. For
sequences which have temporarily still objects from the beginning, such as “Miss America” or “Akiyo,” our proposed approach still have sufficiently good detection results.