Chapter 4 Experimental Results
4.1 Performance & Results
4.1.1 Optimized Performance Evaluation
The mode decision of LALVC plays the important role in choosing better case between the 2 modes, the difference mode and the raw data mode. When the Diff mode is applied, the zero-motion residuals are fed into the line-based encoder.
To evaluate the best performance of the zero-motion architecture, a near optimum case is observed. We encode each line in the difference mode and the raw data mode first, and then apply the better case for real encoding of the current line. Based on the context model statistics property, the method called as Opt_1_line could not derive the near optimum performance. The prior mode decision will alter the statistics property, which changes the coding states of the next lines. For some sequences, even forced zero-motion method would get the better compression ratio.
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
0 50 100 150 200 250 300
forced zero motion Intra_Coding Opt_1_line
Figure 26. Compression ratios for each Frame of the sequence “Foreman”
In Figure 26, the method “Opt_1_line” represents that Foreman sequence is encoded by three times each line. Tow iterations of pre-encoding choose the better one between the difference and raw data modes. Figure 26 shows the compression ratio for each frame.
Opt_1_line can not keep the highest performance in comparison with intra-frame coding and enforced zero-motion prediction coding. Figure 27 show the compression of the 86th frame to observe the performance of line-based compression.
In Figure 27, Opt_1_line performs better than forced-zero-motion in the beginning. The curve shows a decreasing ratio for Opt_1_line. To test the Opt_1_line algorithm, we set the previous mode selection pretended as the enforce-zero-motion, latter part are operated by Opt_1_line. The curve is represented by the “Test” in Figure 27. Figure 27 shows the Opt_1_line could not find the global best mode selection. The following shows the extension version of Opt_1_line to find the near-upper bound performance under the architecture for low-complexity requirement
Observe extending windows of selection in line mode. Opt_1_line introduced in the previous section could not be aware of the optimum case of the image. Context model Frame 86
1.3 1.5 1.7 1.9 2.1 2.3 2.5
0 50 100 150 200 250
line_index
compressed ratio
Opt_1_line forced zero-motion Test
Figure 27. Line compression condition detail in 86th frame of “Foreman”
statistics are increased during coding. Under the structure of LALVC, context is empty at the beginning. Each decision would change the overall coding performance. Opt_1_line could be viewed as one-line range window for pre-coding test. Now, we extend the window to more than one line. When the window size is extended to the whole frame, the performance is the best. For encoding the CIF sequence (352*288), the number of lines is 288. For practical computation, the worse case is that all lines are used for the mode selection. Each line will be encoded by 2288 times for the worst case scenario in the frame windows.
To simply the derivation of optimal case, we set the window size to cover a range of 16 lines. Each line is at most encoded by 216 times. In the desktop platform, Pentium 4 2.0 GHz, windows XP, each frame encoding takes 20 minutes, which does not fit the real requirement and some complexity reduction is needed.
The following results are derived from the simulations with a window of 16 lines. For slow motion sequence, Akiyo, compression ratio is shown in Table 14. The mothod of Opt_16_line is the same as forced zero-motion method.
Table 14 tells the compression results of three methods. For the overall performance, Opt_16_line could get the best performance on the average. In addition to Table 14, we analyze the frame level conditions to evaluate the overall performance.
(1). Akiyo is a typical slow motion sequence. Figure 29. Compression ratio for each frame of the sequence “Akiyo”. OPT_16 is identical to the enforced zero-motion method.
Zero-motion residuals are better for removing coding redundancy.
Figure 28. The diagram of optimum performance evaluation.
Table 14. The list of compression ratios
Intra_coding Zero_motion Opt_16_line
Akiyo.Y(300-frames) 2.70 6.76 6.76
Bus.Y(150-frames) 1.63 1.44 1.64
Football.Y(250-frames) 1.99 1.75 2.03 Foreman.Y(300-frames) 1.97 1.84 2.03 Mobile.Y(300-frames) 1.43 1.36 1.43
Silent.Y(300-frames) 1.86 2.61 2.64
(2). Foreman is a sequence with more motions than Akiyo and Silent. In Figure 30, Opt_16_line locates at the top of the three curves, which proves a windows size of 16 lines
can capture the near-optimum case for the Foreman sequence.
(3). Mobile is a sequence with a moving train and zooming in/out. In Figure 31, Opt_16_line does not occupy the top positions of several frames, which means a larger window size is used to capture the top case.
2 3 4 5 6 7 8 9
0 50 100 150 200 250 300
Intra Ratio ZeroMotion Ratio OPT_16
Figure 29. Compression ratio for each frame of the sequence “Akiyo”
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8
0 50 100 150 200 250 300
Intra Ratio ZeroMotion Ratio OPT_16
Figure 30. Compression ratio for each frame of the sequence “Foreman”
1.25 1.3 1.35 1.4 1.45 1.5 1.55
0 50 100 150 200 250 300
Intra Ratio ZeroMotion Ratio OPT_16
Figure 31. Compression ratio for each frame of the sequence “Mobile”
In frame-level statistics (Figure 26), intra-frame coding is the best one for coding of the 86th frame. In Figure 27, we find that Opt_1_line shows the best gain in the beginning. After 10th line, intra-frame becomes the better one. Ultimately, intra-frame coding gets the better performance. Each pixel coding would alter the context model that affects the coding of the subsequent pixels. With the issue, we set two factors to observe the performance variation.
1. Context Model Interference:
The energy of residuals generated form the difference mode is smaller. The property of the context model for the difference mode should be distinguished from the raw data mode. Hence, the following simulation will show the performance comparison between 1 model and 2-separate models.
2. Context Model Continuity:
Context models are derived based on statistical property of the processing contents. If the model prediction performs very well, the probability of zero residuals will be increased, which indicates the OGDS (one side geometric distribution) would be steeper.
Originally, the reset interval is set as the range of one frame. If the successive frames have some stationary property, enlarging the reset interval may be beneficial for the prediction performance. The remaining issue is the decision of the reset moment.
In the first issue of context model interference, we investigate the control factor of 1 or 2 models for context models.
Table 15. 2-Model and 1-Model performance comparison
Size (Kilo-Bytes) Compression ratios Sequences Opt_16_line
In Table 15, separate context model does not have significant improvement on performance. When the mode decision can choose as the same mode as Opt_16_line, 1-Model approach has identical compression ratios as 2-Model approach. For low complexity, 1-Model instead of 2-Model is used.
Table 16. Performance compression for the factor of state number of Context Model
Size(Kilo-Bytes) Compression ratios
Sequenc
e 2M3G 2M2G 1M3G 1M2G 2M3G 2M2G 1M3G 1M2G
Akiyo 4456 4671 4457 4675 6.665 6.358 6.664 6.353
bus 9093 9085 9091 9085 1.633 1.635 1.633 1.635
Football 12222 12423 12249 12438 2.025 1.992 2.021 1.990
Foreman 14918 15098 14939 15152 1.991 1.967 1.988 1.960
Mobile 20900 21070 20958 21168 1.421 1.410 1.417 1.403 Silent 11330 11740 11330 11748 2.621 2.530 2.621 2.528
OutR 495 477 492 476 20.000 20.755 20.122 20.798
OutG 460 447 460 447 21.522 22.148 21.522 22.148
OutB 417 406 416 407 23.741 24.384 23.798 24.324
Total 74291 75417 74392 75596 2.532 2.494 2.528 2.488
*Bold number represents the best in the row of table
In Table 16, “2M3G” means 2-Models plus 3 gradients applied on the algorithm. Other items mean the same. In Table 16, the best performance is 2M3G. Table 15 shows that 2-Model is proven to have near performance to 1-Model, which has slightly less coding efficiency due to mismatch between the mode decision and Opt_16_line.
The Y component of D1 resolution sequences has 720*480 pixels. The four testing sequences are full of motion. We divide the original sequence into four small sequences by cutting the spatial area. In the high resolution sequence, the performance improvement is not significant. When we divide the sequence into sub-resolution sequences width reduced width to evaluate the effect of the line length. In Figure 15, performance difference for the different sub-resolution process is minor. Cutting D1 into 360*480 resolution sequences could get minor improvement by 0.006 ratios. Side information of tag increases with more segments used. In the resolution of 240*480, D1 sequence is cut into 3 small ones. The average performance may not be better than the resolution of 360*480.
Table 17. Simulation of D1 sequence
(720*480) Ratio Sequence WinRAR WinZip LALVC(1M3G) Intra ZeroMotion
crew(300) 2.047 1.456 2.347 2.348 1.980
harbour(300) 1.494 1.162 1.991 1.991 1.869
night(230) 1.858 1.396 2.220 2.194 2.023
pour_water(1017) 2.279 1.583 2.626 2.608 2.298
rolling_tomato(222) 2.549 1.705 2.883 2.845 2.616
sailormen(300) 1.806 1.310 1.938 1.949 1.683
Total 2.024 1.452 2.366 2.357 2.094
Table 18. Simulation of sub-resolution sequences cut from D1
Sequence 720*480 (360*480)*2 (240*480)*3
crew(300) 2.347 2.345 2.342
harbour(300) 1.991 1.992 1.990
night(230) 2.220 2.220 2.217
pour_water(1017) 2.626 2.644 2.646
rolling_tomato(222) 2.883 2.891 2.894
sailormen(300) 1.938 1.932 1.924
Total 2.366 2.372 2.370