Chapter 4 Multi-Layer Transcoding Approach
4.2. Multi-Layer Transcoding with R-D Optimization
An R-D model is constructed to solve the multi-layer transcoding problems under the limited channel bandwidth. To perform drift-free transcoding, an additional error layer containing the coefficients of the incoherent errors is used as side information to be transmitted to the transcoder. But since channel bandwidth is limited, the resource
allocation for achieving the best transcoding performance and decoded video quality becomes very important.
To model the resource allocation problem under the limited channel bandwidth, eqn. (28) shows the relationship between the original enhancement layer (EL1) and the error layer (EL2). Suppose the given bit rate for the two enhancement-layer bitstreams is R. The solution is to find the best inter-layer ratio α to provide the best transcoding R-D performance. The definition is given in eqn. (28) in which RE is the bit rate of EL1 and Rε
Since FGS enables progressive transmission, both the EL and error layer are capable of being arbitrarily truncated to any desired bit rate according to the inter-layer ratio (α) and the given bit rate (R). Now, the problem is how to find the best α under given bit rate (R) as shown in eqn. (29).
where D(.) is the distortion function.
To provide the optimized solution to eqn. (29), one solution is to exhaustively search through all possible values of α in the range of [0, 1] for the one with the minimum distortion. But such a method takes too much computation powers, and is not preferred.
One efficient but effective way to do is to build an R-D model to provide the best transcoded R-D performance.
To construct the relationship between R and αopt, a statistical method to observe various sequences and bit rates is used. We simulated the MSDDT with various combinations of R and α, where R ranges from 0 to 2560 Kbps with an interval of 256 Kbps and α from 0 to 1 with a step size of 0.05. To bind the influence from the encoder-loop in the transcoder, constant quantization is used for re-encoding. The bit rate of the BL is adjusted to 256, 512, 1024, and 2048 Kbps with TM5 rate control. Four sequences including Akiyo, Foreman, Mobile, and Stefan in CIF format are used for testing with GOP structure N = 15, M = 1 (i.e., IPPP…). Fig. 15 to Fig. 18 show the
resultant rate-distortion curves for various α, where the horizontal axis is the available bit rate for all ELs (R) and the vertical axis is the distortion measure in mean square error (MSE) of the transcoded video (D(.)). The dotted lines represent the interpolated rate-distortion data for different values of α, and the bold lines indicate the rate-distortion optimized inter-layer ratio αopt, where the distortion is minimized subject to the given bit rate. Based on the results in Fig. 15 to Fig. 18, we may obtain the relationships between R and αopt for different sequences and BL bit rates, as shown from Fig. 19 to Fig. 22.
From Fig. 19 to Fig. 22, these relationships exhibit similar properties such as being monotonically increasing or being saturated with high input bit rate. This observation makes it easier to construct a single model to predict all the others. Based on this idea, we present a new model to describe the relationship between R and αopt. Four common models are experimented for assessment, including linear, power-law, quadratic, and exponential. Among them, the power-law and quadratic polynomial act as the most promising candidates for modeling the actual relationships since they both demonstrate resembling functional property as the results in Fig. 19 to Fig. 22, as shown in Fig. 14.
Table 6 shows the approximation of the curve using four different models and we can find that the power-law model provides the best approximation results. So, the new model is formulated in the equation as eqn. (30).
b
opt aR c
α = + (30)
where (a, b, c) is the set of model parameters.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Linear Exponential Power-law Quadratic
Fig. 14. General curve behaviors for different models
Table 6. RMSE of the estimation of the (R, αopt) relationship using different models
RMSE Linear Power-Law Quadratic Exponential Akiyo 0.0411 0.01178 0.01929 0.0423 Foreman 0.09279 0.01757 0.06005 0.1001
Fig. 23 shows the fitting curve of the actual (R, αopt) data which is an averaged form of all the experimented models. Note that this model provides statistical information. The actual relationship between R and αopt may vary with the video content and the BL bit rate. For example, for sequences with slow motion such as Akiyo, the amount of the incoherent error in the video streams is minor such that αopt tends to saturate faster, or for high BL bit rate such as Mobile@2048 Kbps, the contribution of the EL is insignificant since the BL is already with very high video quality, thus αopt shows bias toward the error layer when the bandwidth resource is limited. Through experimental results in later chapter, it can be shown that the proposed power-law model with single parameter set is capable of accommodating the variation in video characteristics and provides satisfying transcoding performances compared to the optimized approach.
0 500 1000 1500 2000 2500 3000
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
100
0 500 1000 1500 2000 2500 3000
0
Fig. 15. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 256-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000
2
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
100
0 500 1000 1500 2000 2500 3000
50
Fig. 16. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 512-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
50
0 500 1000 1500 2000 2500 3000
0
Fig. 17. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 1024-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000
1
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
50
0 500 1000 1500 2000 2500 3000
0
Fig. 18. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 2048-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
Fig. 19. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 256-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
Fig. 20. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 512-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
Fig. 21. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 1024-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
0 500 1000 1500 2000 2500 3000
0
Fig. 22. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 2048-Kbps BL bit rate
0 500 1000 1500 2000 2500 3000 -0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Bit Rate (Kbps) αopt
actual pow
Fig. 23. The fitting curve for (R, αopt) using the power-law model
Chapter 5
Experimental Results
This chapter demonstrates the experimental results of the proposed multi-layer to single-layer transcoder. R-D performance and complexity comparisons are provided to show that the proposed transcoder can provide good transcoding qualities.
5.1. Test Conditions
The test conditions for the experiments of FGS multi-layer to MPEG-1/2/4 single layer are presented as below. The source video sequences are first encoded and archived as three FGS bitstreams consisting of the BL bitstream, the EL bitstream, and the error-layer bitstream. The BL bitstream samples the source video sequence at 30 Hz.
- Video source format — CIF 30 fps
- Test video sequences — Foreman, Akiyo, Mobile, Foreman, etc.
- Video GOP structure — N = 15, M = 1 (i.e., IPPP…).
- Video bit rate for FGS base-layer bitstream — 256 Kbps, 512 Kbps, 1024 Kbps, 2048 Kbps with TM5 rate control.
- Video coding tools — no advanced coding tools in MPEG-4 FGS Profile such as frequency weighting or selective enhancement are used in the FGS EL.
Five transcoding architectures are used for transcoding performance comparison.
z Cascaded Pixel-Domain Transcoder (CPDT) z Cascaded DCT-Domain Transcoder (CDDT) z Simplified DCT-Domain Transcoder (SDDT)
z Modified Simplified DCT-Domain Transcoder where the inter-layer ratio α is determined using the exhaustive search with a step size of 0.05 (MSDDT_Opt) z Modified Simplified DCT-Domain Transcoder where the inter-layer ratio α is
determined using the proposed power-law model (MSDDT_Pow)
To simulate the possible channel bandwidth variation, the total bit rate of the enhancement-layer bitstreams is truncated to bit rate ranging from 256 to 2048 Kbps with an interval of 256 Kbps. The truncation of EL bitstream is implemented in the streaming server through a simple frame-level bit allocation which averages the given bandwidth. In the re-encoding process, the constant quantization step sizes (QPs) are employed, where the set of QP used is chosen such that the output transcoded bit rate would approach the total input bit rate (BL + ELs).
5.2. Rate-Distortion Performance
5.2.1. MPEG-4 FGS to MPEG-1
Fig. 24 to Fig. 27 show the rate-distortion performances of four transcoding architectures, including CPDT, CDDT, SDDT, and MSDDT with the proposed power-law model (MSDDT_Pow). We design a single parameter set of (a, b, c) = (0.3476, 0.18573, -0.77644) for MSDDT_Pow for various BL bit rates. The target scenario is to transcode MPEG-4 FGS bitstream into MPEG-1 bitstream. From Fig. 24 to Fig. 27, we can find that SDDT suffers from serious quality degradation due to incoherent errors in heterogeneous transcoding. Our proposed MSDDT_Pow running at 256-Kbps base-layer bit rate provides up to 4.7 dB, 2.6 dB, and 3.8 dB gain in PSNR over the SDDT for the Foreman, Mobile, and Stefan sequences, respectively. Compare with the CPDT architecture which is usually treated as the transcoder golden reference, the proposed MSDDT_Pow architecture under 256-Kbps base-layer bit rate has only 0.3–0.6 dB, 0.3–0.4 dB, and 0.3–1.1 dB loss in PSNR under various bit rates for the Foreman, Mobile, and Stefan sequences, respectively. Table 7 summarizes the comparison results for the 4 types of transcoder architectures including CPDT, CDDT, SDDT and the proposed
MSDDT_Pow at about 1200 Kbps, 1200 Kbps, and 1300 Kbps for Foreman, Mobile, and
300 500 700 900 1100 1300 1500
Bit Rate (Kbps)
600 800 1000 1200 1400
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600
Bit Rate (Kbps)
Fig. 24. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 256 Kbps.
(a) Foreman (b) Mobile (c) Stefan
26
400 600 800 1000 1200 1400
Bit Rate (Kbps)
600 800 1000 1200 1400 1600
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600
Bit Rate (Kbps)
Fig. 25. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 512 Kbps.
(a) Foreman (b) Mobile (c) Stefan
26
400 600 800 1000 1200 1400
Bit Rate (Kbps)
600 800 1000 1200 1400 1600
Bit Rate (Kbps)
400 600 800 1000 1200 1400
Bit Rate (Kbps)
Fig. 26. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 1024 Kbps. (a) Foreman (b) Mobile (c) Stefan
26
400 600 800 1000 1200 1400
Bit Rate (Kbps)
600 800 1000 1200 1400 1600
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600
Bit Rate (Kbps)
Fig. 27. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 2048 Kbps. (a) Foreman (b) Mobile (c) Stefan
Table 7. Rate-distortion comparison for FGS-to-MPEG-1 transcoding
PSNR (dB) CPDT CDDT MSDDT_Pow SDDT
Foreman - +0.2 -0.6 -5.3
Mobile - +0.3 -0.3 -2.9
256 Kbps
Stefan - +0.2 -1.1 -4.9
Foreman - 0 -0.3 -6.5
Mobile - +0.1 -0.5 -3.4
512 Kbps
Stefan - +0.2 -1 -5.7
Foreman - 0 -0.4 -7.2
Mobile - 0 -0.8 -4.3
1024 Kbps
Stefan - +0.2 -0.8 -6.2
Foreman - 0 -0.4 -7.4
Mobile - 0 -0.9 -4.5
2048 Kbps
Stefan - +0.2 -1 -6.8
5.2.2. MPEG-4 FGS to MPEG-2
Fig. 28 to Fig. 31 show the rate-distortion performances of five transcoding architectures, including CPDT, CDDT, SDDT, MSDDT with the optimized approach (MSDDT_Opt), and MSDDT with the proposed power-law model (MSDDT_Pow). We design a single parameter set of (a, b, c) = (0.3476, 0.18573, -0.77644) for MSDDT_Pow for various BL bit rates. The target scenario is to transcode MPEG-4 FGS bitstream into MPEG-2 Main Profile bitstream. From Fig. 28 to Fig. 31, we can find that SDDT suffers from considerable quality degradation due to incoherent errors in heterogeneous transcoding. Our proposed MSDDT_Pow running at 256-Kbps base-layer bit rate provides up to 2.4 dB, 5.9 dB, 3.4 dB, and 5.4 dB gain in PSNR over the SDDT for the Akiyo, Foreman, Mobile, and Stefan sequences, respectively. Compare with the CDDT architecture, the proposed MSDDT_Pow architecture running at 256-Kbps base-layer bit rate has 0.4–0.6 dB, 0.4–0.8 dB, and 0.4–1.4 dB loss in PSNR under various bit rates for
the Foreman, Mobile, and Stefan sequences, respectively. For the Akiyo sequence, the MSDDT_Pow running at 256-Kbps base-layer bit rate can achieve almost the same transcoding performance as the CDDT architecture, where the PSNR difference is within 0.1 dB. Another comparison is between the MSDDT using the optimized approach and using the proposed model. From Fig. 28, we find that the MSDDT using the power-law model has almost identical PSNR values as the MSDDT based on the optimized exhaustive search running at 256-Kbps base-layer bit rate, which has at maximum a 0.3 dB difference. Table 8 summarizes the comparison results for the 5 types of transcoding architectures including CPDT, CDDT, SDDT, MSDDT_Opt, and the proposed MSDDT_Pow at about 650 Kbps, 2100 Kbps, 2000 Kbps, and 2200 Kbps for Akiyo, Foreman, Mobile, and Stefan, respectively.
Table 8. Rate-distortion comparison for FGS-to-MPEG-2@MP transcoding
PSNR (dB) CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT
Akiyo - -1.2 -1.2 -1.2 -3.6
Foreman - 0 -0.6 -0.6 -6.5
Mobile - +0.2 -0.6 -0.6 -4 256 Kbps
Stefan - +0.2 -1.2 -1.2 -6.6
Akiyo - -1.1 -1.1 -1.1 -3.4
Foreman - -0.2 -0.8 -0.8 -8.2 Mobile - +0.1 -0.8 -0.8 -4.6 512 Kbps
Stefan - +0.2 -1.2 -1.2 -7.3
Akiyo - -1.2 -1.2 -1.2 -3.3
Foreman - -0.3 -1 -1 -9.2
Mobile - +0.1 -0.4 -0.7 -5 1024 Kbps
Stefan - +0.1 -0.9 -1.1 -8.2
Akiyo - -1.6 -1.6 -1.6 -3.5
Foreman - -0.4 -0.8 -1 -9.6
Mobile - 0 -0.2 -1.2 -5.8
2048 Kbps
Stefan - +0.1 -0.5 -1.5 -8.7
32
100 300 500 700
Bit Rate (Kbps)
300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
Bit Rate (Kbps)
20 21 22 23 24 25 26 27 28 29 30
400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 Bit Rate (Kbps)
PSNR (dB)
CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT
(d)
Fig. 28. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 256 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
32
100 300 500 700
Bit Rate (Kbps)
300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600 1800 2000 2200
Bit Rate (Kbps)
20 21 22 23 24 25 26 27 28 29 30 31
400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
Bit Rate (Kbps)
PSNR (dB)
CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT
(d)
Fig. 29. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 512 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
32
100 300 500 700
Bit Rate (Kbps)
300 500 700 900 1100 1300 1500 1700 1900 2100 2300
Bit Rate (Kbps)
300 500 700 900 1100 1300 1500 1700 1900 2100
Bit Rate (Kbps)
20 21 22 23 24 25 26 27 28 29 30 31 32
300 500 700 900 1100 1300 1500 1700 1900 2100 2300
Bit Rate (Kbps)
PSNR (dB)
CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT
(d)
Fig. 30. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 1024 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
32
100 300 500 700
Bit Rate (Kbps)
300 500 700 900 1100 1300 1500 1700 1900 2100 2300
Bit Rate (Kbps)
300 500 700 900 1100 1300 1500 1700 1900 2100
Bit Rate (Kbps)
20
300 500 700 900 1100 1300 1500 1700 1900 2100 2300
Bit Rate (Kbps)
Fig. 31. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 2048 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
5.2.3. MPEG-4 FGS to MPEG-4 SP
Fig. 32 to Fig. 35 show the rate-distortion performances of three transcoding architectures, including CPDT, the FGS-to-MPEG-4@SP transcoder proposed in [24], and MSDDT, for different BL bit rates. The target scenario is to transcode MPEG-4 FGS bitstream into MPEG-4 Simple Profile bitstream, which is free of incoherent error. As shown from Fig. 32 to Fig. 35, the three compared architectures share similar rate-distortion performances. Table 9 summarizes the comparison results for the three types of transcoding architectures including CPDT, work [24], and the proposed MSDDT at about 550 Kbps, 1600 Kbps, 2200 Kbps, and 2300 Kbps for Akiyo, Foreman, Mobile, and Stefan, respectively.
36
400 600 800 1000 1200 1400 1600
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600 1800 2000 2200 2400
Bit Rate (Kbps)
25 26 27 28 29 30 31
500 700 900 1100 1300 1500 1700 1900 2100 2300 2500
Bit Rate (Kbps)
PSNR (dB)
CPDT MSDDT_Pow FGS-to-SP
(d)
Fig. 32. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 256 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
36
400 600 800 1000 1200 1400 1600
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600 1800 2000 2200
Bit Rate (Kbps)
PSNR (dB) CPDT
MSDDT_Pow FGS-to-SP
(c)
25 26 27 28 29 30 31 32
400 600 800 1000 1200 1400 1600 1800 2000 2200
Bit Rate (Kbps)
PSNR (dB)
CPDT MSDDT_Pow FGS-to-SP
(d)
Fig. 33. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 512 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
36
400 600 800 1000 1200 1400 1600
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600 1800 2000
Bit Rate (Kbps)
25 26 27 28 29 30 31 32 33
400 600 800 1000 1200 1400 1600 1800 2000 2200
Bit Rate (Kbps)
PSNR (dB)
CPDT MSDDT_Pow FGS-to-SP
(d)
Fig. 34. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 1024 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
36
400 600 800 1000 1200 1400 1600
Bit Rate (Kbps)
400 600 800 1000 1200 1400 1600 1800 2000
Bit Rate (Kbps)
25
500 700 900 1100 1300 1500 1700 1900 2100
Bit Rate (Kbps)
PSNR (dB) CPDT
MSDDT_Pow FGS-to-SP
(d)
Fig. 35. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 2048 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan
Table 9. Rate-distortion comparison for FGS-to-MPEG-4@SP transcoding
PSNR (dB) CPDT MSDDT_Pow FGS-to-SP [24]
Akiyo - 0 +0.1
5.3. Complexity Analysis
5.3.1. Module-wise Comparison
Table 10 shows the module-wise complexity comparison for the six transcoding architectures. Type I which is referred to as DEC-ENC cascades a full decoder with a full encoder, and takes the most computations compared with the other five architectures.
Type II which is referred to as CPDT saves 1 ME by reusing the decoded MVs. Type III which is referred to as CDDT saves another 4 DCT/IDCT operations by operating transcoding in the DCT domain. Type IV which is referred to as SDDT performs MC using the residue differences only to reduce the requirement of two frame buffers to be one. Type V is a simplified pixel-domain transcoder proposed in [24] which is a similar form of Type IV, but requires 2 extra DCT/IDCT operations compared to Type IV to allow this architecture operating in pixel domain. Type VI is our proposed multi-layer to single-layer transcoder which uses the same transcoding architecture as Type IV, but with a proposed multi-layer technique for handling the incoherent error problem. Type VI and Type IV both require only 1 MC and 1 frame buffer. From Table 10, the proposed transcoding framework shows the lowest computational complexity. Compared to Type I, the proposed framework saves 1 ME, 1 frame buffer, 4 DCT/IDCT, and 1 MC.
Compared to Type II, 1 frame buffer, 4 DCT/IDCT, and 1 MC can be saved. Compared to Type III, 1 frame buffer and 1 MC are saved.
Table 10. Module-wise complexity comparison of six transcoding architectures MC Type Transcoding
Architecture
ME Frame Buffer
DCT/
IDCT Spatial Transform
I DEC-ENC 1 2 4 2 0
II CPDT 0 2 4 2 0
III CDDT 0 2 0 0 2
IV SDDT 0 1 0 0 1
V Work [24] 0 1 2 1 0
VI Proposed 0 1 0 0 1
5.3.2. Arithmetic Operations Comparison
To provide a more specific complexity analysis, the arithmetic instructions are analyzed to provide the workload percentage analysis for the six transcoding architectures. To build the relationship in complexity for the six architectures, Type I which is the most computationally intensive is used as the reference for the other five architectures. The representation of the complexity for the six architectures is shown in percentage compared to Type I.
A. Arithmetic Instructions for Each Module
Table 11 shows the instruction counts for the modules in Table 10. The DCT and IDCT modules which operate 8×8 forward and backward DCT take 672 and 912 adder/shifter instructions [25], respectively. The MC-DCT module which operates DCT-domain MC instead of spatial-domain MC takes at most 810 adder/subtractor instructions and 256 instructions for data movement [9]. The total instruction counts (IC) for each module equal to the product of instruction counts and the corresponding cycle per instruction (CPI). Here, we assume that the ALU and data movement instructions take one clock cycle per instruction.
Table 11. Instructions required per block for each module
Add/sub (Iadd/sub)
Data movement
(Idata_mov)
Multi/div (Imul/div)
Total instructions
DCT [25] 672 64 0 736
IDCT [25] 912 64 0 976
MC (pixel) 0 64 0 64
MC-DCT [9] ≤ 810 ≤ 256 0 1066
B. Workload Analysis
Fig. 36-a shows the module-wise workload distribution for Type I under the experiments using Foreman as the test sequence. From this pi chart, we can find ME takes 54.4% (ΦME), Bit Plane VLD for FGS takes 30.2% (ΦFGS_VLD), DCT/IDCT take 13.6% (ΦDCT/IDCT), MC takes 0.3% (ΦMC), Q/IQ take 0.1% (ΦQ/IQ), VLC/VLD for base layer take 1.0% (ΦBASE_VLC/VLD), and the others take the remaining parts (Φothers).
To convert the arithmetic instruction cycles into workload percentages, the following relationship in eqn. (31) is used to build Table 12, where ΦTypeN is the fraction of the computation time for each module in Type N. Table 12 shows the complexity ratio (CR) for the six architectures compared with Type I.
TypeN
TypeN TypeI
TypeI
IC
Φ = IC ⋅Φ (31)
For illustration, Type VI which is proposed in this thesis takes only 35.66% of computational power compared to Type I. From Table 10, Type VI saves 1 ME (54.35%), 3 IDCT plus 1DCT (13.61%), and 2 spatial-domain MC (0.29%), but needs another extra MC-DCT. According to eqn. (31), the revised fraction for MC amounts to 0.29% × 1066 / 128 = 2.42% of the overall complexity, where ICTypeVI = 1066 (1 MC-DCT) and ICTypeI
= 2 × 64 (2 pixel-domain MC). Since no instructions are needed for ME and DCT/IDCT, the new workload percentage for these modules is 0%. Therefore, the complexity ratio of Type VI is 0% (ΦME) + 0% (ΦDCT/IDCT) + 2.42% (ΦMC) + 31.75% (ΦFGS_VLD + ΦQ/IQ + ΦBASE_VLC/VLD + Φothers) = 34.17%. The other derivations for Type II to V are similar and shown in Table 12. The workload reduction is also represented as pi chart shown in Fig.
36 for the estimated complexity analysis for the six architectures in arithmetic operation levels.
Table 12. Arithmetic complexity ratio for the six transcoding architectures compared to the DEC-ENC architecture.
Type I Type II Type III Architecture
DEC-ENC CPDT CDDT
workload(%) operations workload(%) operations workload(%) operations
ME 54.35 — 0 0 0 0
DCT/IDCT 13.61 3664 13.61 3664 0 0
MC 0.29 128 0.29 128 4.83 2132
Others 31.75 — 31.75 — 31.75 —
Total 100 45.65 36.58
Type IV Type V Type VI
Architecture
SDDT Work [24] Proposed
workload(%) operations workload(%) operations workload(%) operations
ME 0 0 0 0 0 0
DCT/IDCT 0 0 6.36 1712 0 0
MC 2.42 1066 0.15 64 2.42 1066
Others 31.75 — 31.75 — 31.75 —
Total 34.17 38.26 34.17
54.35%
30.18%
Fig. 36. Estimated operational complexity comparison of the six transcoding architectures for Foreman
Chapter 6 Conclusion
In this thesis, we proposed a FGS multi-layer to MPEG-1/2/4 single-layer transcoding framework using multi-layers transcoding techniques with R-D optimization.
This proposed framework is constructed based on the SDDT architecture which is considered as one of the most computationally efficient transcoding architectures. To resolve the drift propagation problem raised by SDDT architecture in heterogeneous transcoding, two transcoding techniques, multi-layer transcoding and rate-distortion optimized universal model are developed to improve it. The multi-layer transcoding technique provides heterogeneous drift error compensation via transmitting additional enhancement layer. The rate-distortion optimized universal model is used to achieve a balance between coding efficiency and transmission bit rate under limited channel bandwidth. The proposed framework could efficiently transcode the FGS to MPEG-1/2/4 bitstream in a shared architecture and achieves a better transcoding complexity and transcoding quality balancing than conventional architectures.
The experimental results showed the proposed MSDDT architecture can provide a
The experimental results showed the proposed MSDDT architecture can provide a