Figure 3.14: The Extended Wrong Reference Problem when Multiple Reference Frame is Used
Figure 3.15: The Visual Illustration after Each Re nement Step during the VET Transcoding
Sec 3.5. Simulation Results
Table 3.6: The Encoder Parameters for the Experiments Frame size QCIF (176x144), SD (720x480)
Frame rate 30 frames/s
GOP structure IPPP...P
Total frame 100
Intra period 15
Reference frame number 1
Motion estimation range 16 for QCIF, 64 for SD Quantization step size 17,21,25,29,33,37
ods, test sequences and picture insertion scenarios. For a fair comparison, all the transcoding methods have been implemented based on H.264/AVC reference software of version JM9.4.
In addition, all the transcoders are built using Visual .NET compiler on a desktop with Win-dows XP, Intel P4 3.2 GHz and 2 Giga bytes DRAM. To further speed up the H.264/AVC based transcoding, the source code of the reference CAVLD module is optimized using a ta-ble lookup technique [34]. In the simulations, the test sequences are pre-encoded with the test conditions as shown in Table 3.6. The notation for each new transcoded bitstream is `back-ground_foreground_x_y', where x and y are the coordinates of the foreground picture. The values of x and y need to be on the MB boundaries within the background picture. To eval-uate the picture quality of each reconstructed sequence, the two original source sequences are combined to be the reference video source for peak-signal-to-noise-ratio (PSNR) computation.
The percentage of each MB type and each 4 4 block type is shown in Figure 3.16. In general, the p-MBs occupy 30% to 80% of MBs and the percentage of the w-MBs is less than 15%. In addition, the w-blocks occupy only 5% of the 4 4 blocks. Bypassing all the p-blocks that are 95% of p-blocks accelerates the transcoding process as shown in Table 3.7. On the average, as compared to the CPDT, the MW-VET can achieve 25 times of speedup with improved picture quality.
Table 3.8 lists the PSNR comparison to show the effectiveness of error correction for
differ-Figure 3.16: The Percentage of the Macroblock Types and the Block types during the VET Transcoding
Table 3.7: The Improvement of Execution Time and Quality as Compared to the CPDT(1) VET combination Speed-up ratio PSNR gain of Luma component
BG(2) FG(3)& lication
Stefan Mobile_1_1 25 +1.72 dB
Table Carphone_1_1 28 +1.56 dB
Stefan Mobile_1_1 28 + 1.18 dB
Foreman_33_1 News_1_20 Coastguard_33_20
Table MD_1_1 25 + 1.15 dB
Stefan _33_1 Carphone _1_20 News_33_20
(1) Intel P4 3.2G, 2GB SDRAM, Windows XP and Visual .NET compiler.
(2) All are in SD (720 480) resolution.
(3) All are in QCIF (176 144) resolution.
Sec 3.5. Simulation Results
Table 3.8: The Effectiveness of Error Correction (EC) for Different Kinds of p-blocks
Methods PSNR
Golden 43.73
CPDT 42.02
RFMT w/o EC 41.18
RFMT with EC for the p-blocks in intra-coded w-MBs 43.16 RFMT with EC for all intra-coded p-blocks 43.33 RFMT with EC for all inter-coded p-blocks 43.14
ent kinds of blocks. The Golden method is not a transcoding scheme. The R-D curves of Golden method are obtained from encoding the original picture-in-picture source sequences. The inclu-sion of the R-D curves of Golden method is to highlight the upper bound of a transcoder. The error correction of p-blocks in the intra-coded w-MBs can obtain a signi cant gain in picture quality. However, the error correction for other p-blocks almost has no quality improvement while the complexity increases dramatically. Therefore, the results verify our derivations in Section 3.4.
The R-D performance of different approaches at various bit rates and different VET sce-narios are compared. We embedded one foreground picture into one background picture at different positions in Figure 3.17 and Figure 3.18. The performance of RFMT is better than that of CPDT. At medium and high bit rates, the RFMT can offer up to 1.5 dB improvement in PSNR. Even through the mode and motion vectors obtained by our IMS and MVR is not always the optimal solution, the simulation results show that our IMS and MVR approaches provide a solution close to the optimal case. In the comparison, we have plotted the R-D curves named as RFMT_RDO to show the optimal R-D performance when the partial re-encoding is performed under RDO mode decision and motion vector re-estimation. It could be observed that the R-D performance of RFMT with IMS and MVR is very close to that of RFMT_RDO.
Figure 3.19 shows the R-D curve of transcoding bitstreams that embed four foreground pictures onto one background picture at the same time. As compared with the one-foreground
(a)
(b)
Figure 3.17: The Rate-Distortion Performance of the Luminance Component When One Fore-ground Carphone_QCIF is Embedded in Table_SD: (a) Table_SD_Carphone_QCIF_1_1. (b)
Sec 3.5. Simulation Results
(a)
(b)
Figure 3.18: The Rate-Distortion Performance of the Luminance Component When One Fore-ground Foreman_QCIF is Embedded in Mobile_SD: (a) Mobile_SD_Foreman_QCIF_1_1. (b) Mobile_SD_Foreman_QCIF_33_20.
VET scenarios, the performance has a little degradation because that the ratio of w-blocks and p-blocks increases. Figure 3.20 shows the performance of multi-generation transcoding that embeds one foreground picture to the background picture every generation. Our MW-VET can retain the R-D performance while the CPDT degrades every generation. Thus, the proposed MW-VET is robust for the multi-generation transcoding.
3.6 Summary
In this chapter, we have proposed a low-complexity algorithm of a H.264/AVC multiple-window video embedding transcoder (MW-VET) to embed the multiple foreground videos into one background video. The pictures are inserted at the MB-aligned positions to retain high exibil-ity.
As the prediction is applied to the slice-aligned data partitions within the original bitstreams, the SGT parses and merges the bitstreams directly. When the prediction is applied to the region-aligned data partitions, the MBs with wrong prediction reference are processed with the RFMT that partially re-encodes the blocks to minimize the number of re ned blocks. To handle inter-coded and intra-inter-coded blocks that suffer from the wrong reference problem, the RFMT employs motion vector remapping (MVR) and intra mode switching (IMS) respectively. The un-affected MBs are handled by the syntax level bypassing (SLB) in terms of transcoding throughput and picture quality. Apart from a fully functional decoder, the proposed algorithm requires only minor extra complexity for foreground insertion.
To improve coding ef ciency and alleviate drifting errors, every w-block, whose (inter or intra) reference samples are covered entirely or partially by the foreground images, is ne tuned such that the updated inter or intra reference samples are derived completely from the back-ground region. Further, the residues of the p-blocks within intra-coded macroblocks, which are
Sec 3.6. Summary
(a)
(b)
Figure 3.19: The Rate-Distortion Performance of the Luminance Component by Four Fore-grounds Embedding with the Single-Generation Transcoding: (a) Table_SD_MD_QCIF_1_1 _Stefan_QCIF_33_1_Carphone_QCIF_1_20_News_QCIF_33_20. (b) Mobile_
(a)
(b)
Figure 3.20: The Rate-Distortion Performance of the Luminance Component by Four Fore-grounds Embedding with the Multi-Generation Transcoding: (a) Table_SD_MD_QCIF_1_1
Sec 3.6. Summary
subsequently predicted from the w-blocks, are re ned to stop error propagation in the spatial domain. In addition, unnecessary computations are skipped by detecting unaffected blocks.
Our results show that the RFMT as compared to the cascaded pixel domain transcoder (CPDT) can signi cantly reduce the processing complexity by 25 times with similar or higher R-D performance. In addition, the RFMT can achieve up to 1.5 dB quality improvement in PSNR. Based on the RFMT, the quality improvement over the CPDT is signi cant for multi-generation transcoding.