• 沒有找到結果。

Chapter 4 Multi-Layer Transcoding Approach

4.2. Multi-Layer Transcoding with R-D Optimization

An R-D model is constructed to solve the multi-layer transcoding problems under the limited channel bandwidth. To perform drift-free transcoding, an additional error layer containing the coefficients of the incoherent errors is used as side information to be transmitted to the transcoder. But since channel bandwidth is limited, the resource

allocation for achieving the best transcoding performance and decoded video quality becomes very important.

To model the resource allocation problem under the limited channel bandwidth, eqn. (28) shows the relationship between the original enhancement layer (EL1) and the error layer (EL2). Suppose the given bit rate for the two enhancement-layer bitstreams is R. The solution is to find the best inter-layer ratio α to provide the best transcoding R-D performance. The definition is given in eqn. (28) in which RE is the bit rate of EL1 and Rε

Since FGS enables progressive transmission, both the EL and error layer are capable of being arbitrarily truncated to any desired bit rate according to the inter-layer ratio (α) and the given bit rate (R). Now, the problem is how to find the best α under given bit rate (R) as shown in eqn. (29).

where D(.) is the distortion function.

To provide the optimized solution to eqn. (29), one solution is to exhaustively search through all possible values of α in the range of [0, 1] for the one with the minimum distortion. But such a method takes too much computation powers, and is not preferred.

One efficient but effective way to do is to build an R-D model to provide the best transcoded R-D performance.

To construct the relationship between R and αopt, a statistical method to observe various sequences and bit rates is used. We simulated the MSDDT with various combinations of R and α, where R ranges from 0 to 2560 Kbps with an interval of 256 Kbps and α from 0 to 1 with a step size of 0.05. To bind the influence from the encoder-loop in the transcoder, constant quantization is used for re-encoding. The bit rate of the BL is adjusted to 256, 512, 1024, and 2048 Kbps with TM5 rate control. Four sequences including Akiyo, Foreman, Mobile, and Stefan in CIF format are used for testing with GOP structure N = 15, M = 1 (i.e., IPPP…). Fig. 15 to Fig. 18 show the

resultant rate-distortion curves for various α, where the horizontal axis is the available bit rate for all ELs (R) and the vertical axis is the distortion measure in mean square error (MSE) of the transcoded video (D(.)). The dotted lines represent the interpolated rate-distortion data for different values of α, and the bold lines indicate the rate-distortion optimized inter-layer ratio αopt, where the distortion is minimized subject to the given bit rate. Based on the results in Fig. 15 to Fig. 18, we may obtain the relationships between R and αopt for different sequences and BL bit rates, as shown from Fig. 19 to Fig. 22.

From Fig. 19 to Fig. 22, these relationships exhibit similar properties such as being monotonically increasing or being saturated with high input bit rate. This observation makes it easier to construct a single model to predict all the others. Based on this idea, we present a new model to describe the relationship between R and αopt. Four common models are experimented for assessment, including linear, power-law, quadratic, and exponential. Among them, the power-law and quadratic polynomial act as the most promising candidates for modeling the actual relationships since they both demonstrate resembling functional property as the results in Fig. 19 to Fig. 22, as shown in Fig. 14.

Table 6 shows the approximation of the curve using four different models and we can find that the power-law model provides the best approximation results. So, the new model is formulated in the equation as eqn. (30).

b

opt aR c

α = + (30)

where (a, b, c) is the set of model parameters.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Linear Exponential Power-law Quadratic

Fig. 14. General curve behaviors for different models

Table 6. RMSE of the estimation of the (R, αopt) relationship using different models

RMSE Linear Power-Law Quadratic Exponential Akiyo 0.0411 0.01178 0.01929 0.0423 Foreman 0.09279 0.01757 0.06005 0.1001

Fig. 23 shows the fitting curve of the actual (R, αopt) data which is an averaged form of all the experimented models. Note that this model provides statistical information. The actual relationship between R and αopt may vary with the video content and the BL bit rate. For example, for sequences with slow motion such as Akiyo, the amount of the incoherent error in the video streams is minor such that αopt tends to saturate faster, or for high BL bit rate such as Mobile@2048 Kbps, the contribution of the EL is insignificant since the BL is already with very high video quality, thus αopt shows bias toward the error layer when the bandwidth resource is limited. Through experimental results in later chapter, it can be shown that the proposed power-law model with single parameter set is capable of accommodating the variation in video characteristics and provides satisfying transcoding performances compared to the optimized approach.

0 500 1000 1500 2000 2500 3000

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

100

0 500 1000 1500 2000 2500 3000

0

Fig. 15. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 256-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000

2

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

100

0 500 1000 1500 2000 2500 3000

50

Fig. 16. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 512-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

50

0 500 1000 1500 2000 2500 3000

0

Fig. 17. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 1024-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000

1

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

50

0 500 1000 1500 2000 2500 3000

0

Fig. 18. MSE vs. bit rate when running MSDDT with various α, and R combinations for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 2048-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

Fig. 19. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 256-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

Fig. 20. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 512-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

Fig. 21. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 1024-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

0 500 1000 1500 2000 2500 3000

0

Fig. 22. αopt vs. bit rate for Akiyo (upper left), Foreman (upper right), Mobile (lower left), and Stefan (lower right) at 2048-Kbps BL bit rate

0 500 1000 1500 2000 2500 3000 -0.1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Bit Rate (Kbps) αopt

actual pow

Fig. 23. The fitting curve for (R, αopt) using the power-law model

Chapter 5

Experimental Results

This chapter demonstrates the experimental results of the proposed multi-layer to single-layer transcoder. R-D performance and complexity comparisons are provided to show that the proposed transcoder can provide good transcoding qualities.

5.1. Test Conditions

The test conditions for the experiments of FGS multi-layer to MPEG-1/2/4 single layer are presented as below. The source video sequences are first encoded and archived as three FGS bitstreams consisting of the BL bitstream, the EL bitstream, and the error-layer bitstream. The BL bitstream samples the source video sequence at 30 Hz.

- Video source format — CIF 30 fps

- Test video sequences — Foreman, Akiyo, Mobile, Foreman, etc.

- Video GOP structure — N = 15, M = 1 (i.e., IPPP…).

- Video bit rate for FGS base-layer bitstream — 256 Kbps, 512 Kbps, 1024 Kbps, 2048 Kbps with TM5 rate control.

- Video coding tools — no advanced coding tools in MPEG-4 FGS Profile such as frequency weighting or selective enhancement are used in the FGS EL.

Five transcoding architectures are used for transcoding performance comparison.

z Cascaded Pixel-Domain Transcoder (CPDT) z Cascaded DCT-Domain Transcoder (CDDT) z Simplified DCT-Domain Transcoder (SDDT)

z Modified Simplified DCT-Domain Transcoder where the inter-layer ratio α is determined using the exhaustive search with a step size of 0.05 (MSDDT_Opt) z Modified Simplified DCT-Domain Transcoder where the inter-layer ratio α is

determined using the proposed power-law model (MSDDT_Pow)

To simulate the possible channel bandwidth variation, the total bit rate of the enhancement-layer bitstreams is truncated to bit rate ranging from 256 to 2048 Kbps with an interval of 256 Kbps. The truncation of EL bitstream is implemented in the streaming server through a simple frame-level bit allocation which averages the given bandwidth. In the re-encoding process, the constant quantization step sizes (QPs) are employed, where the set of QP used is chosen such that the output transcoded bit rate would approach the total input bit rate (BL + ELs).

5.2. Rate-Distortion Performance

5.2.1. MPEG-4 FGS to MPEG-1

Fig. 24 to Fig. 27 show the rate-distortion performances of four transcoding architectures, including CPDT, CDDT, SDDT, and MSDDT with the proposed power-law model (MSDDT_Pow). We design a single parameter set of (a, b, c) = (0.3476, 0.18573, -0.77644) for MSDDT_Pow for various BL bit rates. The target scenario is to transcode MPEG-4 FGS bitstream into MPEG-1 bitstream. From Fig. 24 to Fig. 27, we can find that SDDT suffers from serious quality degradation due to incoherent errors in heterogeneous transcoding. Our proposed MSDDT_Pow running at 256-Kbps base-layer bit rate provides up to 4.7 dB, 2.6 dB, and 3.8 dB gain in PSNR over the SDDT for the Foreman, Mobile, and Stefan sequences, respectively. Compare with the CPDT architecture which is usually treated as the transcoder golden reference, the proposed MSDDT_Pow architecture under 256-Kbps base-layer bit rate has only 0.3–0.6 dB, 0.3–0.4 dB, and 0.3–1.1 dB loss in PSNR under various bit rates for the Foreman, Mobile, and Stefan sequences, respectively. Table 7 summarizes the comparison results for the 4 types of transcoder architectures including CPDT, CDDT, SDDT and the proposed

MSDDT_Pow at about 1200 Kbps, 1200 Kbps, and 1300 Kbps for Foreman, Mobile, and

300 500 700 900 1100 1300 1500

Bit Rate (Kbps)

600 800 1000 1200 1400

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600

Bit Rate (Kbps)

Fig. 24. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 256 Kbps.

(a) Foreman (b) Mobile (c) Stefan

26

400 600 800 1000 1200 1400

Bit Rate (Kbps)

600 800 1000 1200 1400 1600

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600

Bit Rate (Kbps)

Fig. 25. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 512 Kbps.

(a) Foreman (b) Mobile (c) Stefan

26

400 600 800 1000 1200 1400

Bit Rate (Kbps)

600 800 1000 1200 1400 1600

Bit Rate (Kbps)

400 600 800 1000 1200 1400

Bit Rate (Kbps)

Fig. 26. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 1024 Kbps. (a) Foreman (b) Mobile (c) Stefan

26

400 600 800 1000 1200 1400

Bit Rate (Kbps)

600 800 1000 1200 1400 1600

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600

Bit Rate (Kbps)

Fig. 27. FGS-to-MPEG-1 transcoding performance comparison under FGS base-layer bit rate of 2048 Kbps. (a) Foreman (b) Mobile (c) Stefan

Table 7. Rate-distortion comparison for FGS-to-MPEG-1 transcoding

PSNR (dB) CPDT CDDT MSDDT_Pow SDDT

Foreman - +0.2 -0.6 -5.3

Mobile - +0.3 -0.3 -2.9

256 Kbps

Stefan - +0.2 -1.1 -4.9

Foreman - 0 -0.3 -6.5

Mobile - +0.1 -0.5 -3.4

512 Kbps

Stefan - +0.2 -1 -5.7

Foreman - 0 -0.4 -7.2

Mobile - 0 -0.8 -4.3

1024 Kbps

Stefan - +0.2 -0.8 -6.2

Foreman - 0 -0.4 -7.4

Mobile - 0 -0.9 -4.5

2048 Kbps

Stefan - +0.2 -1 -6.8

5.2.2. MPEG-4 FGS to MPEG-2

Fig. 28 to Fig. 31 show the rate-distortion performances of five transcoding architectures, including CPDT, CDDT, SDDT, MSDDT with the optimized approach (MSDDT_Opt), and MSDDT with the proposed power-law model (MSDDT_Pow). We design a single parameter set of (a, b, c) = (0.3476, 0.18573, -0.77644) for MSDDT_Pow for various BL bit rates. The target scenario is to transcode MPEG-4 FGS bitstream into MPEG-2 Main Profile bitstream. From Fig. 28 to Fig. 31, we can find that SDDT suffers from considerable quality degradation due to incoherent errors in heterogeneous transcoding. Our proposed MSDDT_Pow running at 256-Kbps base-layer bit rate provides up to 2.4 dB, 5.9 dB, 3.4 dB, and 5.4 dB gain in PSNR over the SDDT for the Akiyo, Foreman, Mobile, and Stefan sequences, respectively. Compare with the CDDT architecture, the proposed MSDDT_Pow architecture running at 256-Kbps base-layer bit rate has 0.4–0.6 dB, 0.4–0.8 dB, and 0.4–1.4 dB loss in PSNR under various bit rates for

the Foreman, Mobile, and Stefan sequences, respectively. For the Akiyo sequence, the MSDDT_Pow running at 256-Kbps base-layer bit rate can achieve almost the same transcoding performance as the CDDT architecture, where the PSNR difference is within 0.1 dB. Another comparison is between the MSDDT using the optimized approach and using the proposed model. From Fig. 28, we find that the MSDDT using the power-law model has almost identical PSNR values as the MSDDT based on the optimized exhaustive search running at 256-Kbps base-layer bit rate, which has at maximum a 0.3 dB difference. Table 8 summarizes the comparison results for the 5 types of transcoding architectures including CPDT, CDDT, SDDT, MSDDT_Opt, and the proposed MSDDT_Pow at about 650 Kbps, 2100 Kbps, 2000 Kbps, and 2200 Kbps for Akiyo, Foreman, Mobile, and Stefan, respectively.

Table 8. Rate-distortion comparison for FGS-to-MPEG-2@MP transcoding

PSNR (dB) CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT

Akiyo - -1.2 -1.2 -1.2 -3.6

Foreman - 0 -0.6 -0.6 -6.5

Mobile - +0.2 -0.6 -0.6 -4 256 Kbps

Stefan - +0.2 -1.2 -1.2 -6.6

Akiyo - -1.1 -1.1 -1.1 -3.4

Foreman - -0.2 -0.8 -0.8 -8.2 Mobile - +0.1 -0.8 -0.8 -4.6 512 Kbps

Stefan - +0.2 -1.2 -1.2 -7.3

Akiyo - -1.2 -1.2 -1.2 -3.3

Foreman - -0.3 -1 -1 -9.2

Mobile - +0.1 -0.4 -0.7 -5 1024 Kbps

Stefan - +0.1 -0.9 -1.1 -8.2

Akiyo - -1.6 -1.6 -1.6 -3.5

Foreman - -0.4 -0.8 -1 -9.6

Mobile - 0 -0.2 -1.2 -5.8

2048 Kbps

Stefan - +0.1 -0.5 -1.5 -8.7

32

100 300 500 700

Bit Rate (Kbps)

300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600 1800 2000 2200 2400

Bit Rate (Kbps)

20 21 22 23 24 25 26 27 28 29 30

400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 Bit Rate (Kbps)

PSNR (dB)

CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT

(d)

Fig. 28. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 256 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

32

100 300 500 700

Bit Rate (Kbps)

300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600 1800 2000 2200

Bit Rate (Kbps)

20 21 22 23 24 25 26 27 28 29 30 31

400 600 800 1000 1200 1400 1600 1800 2000 2200 2400

Bit Rate (Kbps)

PSNR (dB)

CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT

(d)

Fig. 29. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 512 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

32

100 300 500 700

Bit Rate (Kbps)

300 500 700 900 1100 1300 1500 1700 1900 2100 2300

Bit Rate (Kbps)

300 500 700 900 1100 1300 1500 1700 1900 2100

Bit Rate (Kbps)

20 21 22 23 24 25 26 27 28 29 30 31 32

300 500 700 900 1100 1300 1500 1700 1900 2100 2300

Bit Rate (Kbps)

PSNR (dB)

CPDT CDDT MSDDT_Opt MSDDT_Pow SDDT

(d)

Fig. 30. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 1024 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

32

100 300 500 700

Bit Rate (Kbps)

300 500 700 900 1100 1300 1500 1700 1900 2100 2300

Bit Rate (Kbps)

300 500 700 900 1100 1300 1500 1700 1900 2100

Bit Rate (Kbps)

20

300 500 700 900 1100 1300 1500 1700 1900 2100 2300

Bit Rate (Kbps)

Fig. 31. FGS-to-MPEG-2@MP transcoding performance comparison under FGS base-layer bit rate of 2048 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

5.2.3. MPEG-4 FGS to MPEG-4 SP

Fig. 32 to Fig. 35 show the rate-distortion performances of three transcoding architectures, including CPDT, the FGS-to-MPEG-4@SP transcoder proposed in [24], and MSDDT, for different BL bit rates. The target scenario is to transcode MPEG-4 FGS bitstream into MPEG-4 Simple Profile bitstream, which is free of incoherent error. As shown from Fig. 32 to Fig. 35, the three compared architectures share similar rate-distortion performances. Table 9 summarizes the comparison results for the three types of transcoding architectures including CPDT, work [24], and the proposed MSDDT at about 550 Kbps, 1600 Kbps, 2200 Kbps, and 2300 Kbps for Akiyo, Foreman, Mobile, and Stefan, respectively.

36

400 600 800 1000 1200 1400 1600

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600 1800 2000 2200 2400

Bit Rate (Kbps)

25 26 27 28 29 30 31

500 700 900 1100 1300 1500 1700 1900 2100 2300 2500

Bit Rate (Kbps)

PSNR (dB)

CPDT MSDDT_Pow FGS-to-SP

(d)

Fig. 32. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 256 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

36

400 600 800 1000 1200 1400 1600

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600 1800 2000 2200

Bit Rate (Kbps)

PSNR (dB) CPDT

MSDDT_Pow FGS-to-SP

(c)

25 26 27 28 29 30 31 32

400 600 800 1000 1200 1400 1600 1800 2000 2200

Bit Rate (Kbps)

PSNR (dB)

CPDT MSDDT_Pow FGS-to-SP

(d)

Fig. 33. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 512 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

36

400 600 800 1000 1200 1400 1600

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600 1800 2000

Bit Rate (Kbps)

25 26 27 28 29 30 31 32 33

400 600 800 1000 1200 1400 1600 1800 2000 2200

Bit Rate (Kbps)

PSNR (dB)

CPDT MSDDT_Pow FGS-to-SP

(d)

Fig. 34. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 1024 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

36

400 600 800 1000 1200 1400 1600

Bit Rate (Kbps)

400 600 800 1000 1200 1400 1600 1800 2000

Bit Rate (Kbps)

25

500 700 900 1100 1300 1500 1700 1900 2100

Bit Rate (Kbps)

PSNR (dB) CPDT

MSDDT_Pow FGS-to-SP

(d)

Fig. 35. FGS-to-MPEG-4@SP transcoding performance comparison under FGS base-layer bit rate of 2048 Kbps (a) Akiyo (b) Foreman (c) Mobile (d) Stefan

Table 9. Rate-distortion comparison for FGS-to-MPEG-4@SP transcoding

PSNR (dB) CPDT MSDDT_Pow FGS-to-SP [24]

Akiyo - 0 +0.1

5.3. Complexity Analysis

5.3.1. Module-wise Comparison

Table 10 shows the module-wise complexity comparison for the six transcoding architectures. Type I which is referred to as DEC-ENC cascades a full decoder with a full encoder, and takes the most computations compared with the other five architectures.

Type II which is referred to as CPDT saves 1 ME by reusing the decoded MVs. Type III which is referred to as CDDT saves another 4 DCT/IDCT operations by operating transcoding in the DCT domain. Type IV which is referred to as SDDT performs MC using the residue differences only to reduce the requirement of two frame buffers to be one. Type V is a simplified pixel-domain transcoder proposed in [24] which is a similar form of Type IV, but requires 2 extra DCT/IDCT operations compared to Type IV to allow this architecture operating in pixel domain. Type VI is our proposed multi-layer to single-layer transcoder which uses the same transcoding architecture as Type IV, but with a proposed multi-layer technique for handling the incoherent error problem. Type VI and Type IV both require only 1 MC and 1 frame buffer. From Table 10, the proposed transcoding framework shows the lowest computational complexity. Compared to Type I, the proposed framework saves 1 ME, 1 frame buffer, 4 DCT/IDCT, and 1 MC.

Compared to Type II, 1 frame buffer, 4 DCT/IDCT, and 1 MC can be saved. Compared to Type III, 1 frame buffer and 1 MC are saved.

Table 10. Module-wise complexity comparison of six transcoding architectures MC Type Transcoding

Architecture

ME Frame Buffer

DCT/

IDCT Spatial Transform

I DEC-ENC 1 2 4 2 0

II CPDT 0 2 4 2 0

III CDDT 0 2 0 0 2

IV SDDT 0 1 0 0 1

V Work [24] 0 1 2 1 0

VI Proposed 0 1 0 0 1

5.3.2. Arithmetic Operations Comparison

To provide a more specific complexity analysis, the arithmetic instructions are analyzed to provide the workload percentage analysis for the six transcoding architectures. To build the relationship in complexity for the six architectures, Type I which is the most computationally intensive is used as the reference for the other five architectures. The representation of the complexity for the six architectures is shown in percentage compared to Type I.

A. Arithmetic Instructions for Each Module

Table 11 shows the instruction counts for the modules in Table 10. The DCT and IDCT modules which operate 8×8 forward and backward DCT take 672 and 912 adder/shifter instructions [25], respectively. The MC-DCT module which operates DCT-domain MC instead of spatial-domain MC takes at most 810 adder/subtractor instructions and 256 instructions for data movement [9]. The total instruction counts (IC) for each module equal to the product of instruction counts and the corresponding cycle per instruction (CPI). Here, we assume that the ALU and data movement instructions take one clock cycle per instruction.

Table 11. Instructions required per block for each module

Add/sub (Iadd/sub)

Data movement

(Idata_mov)

Multi/div (Imul/div)

Total instructions

DCT [25] 672 64 0 736

IDCT [25] 912 64 0 976

MC (pixel) 0 64 0 64

MC-DCT [9] ≤ 810 ≤ 256 0 1066

B. Workload Analysis

Fig. 36-a shows the module-wise workload distribution for Type I under the experiments using Foreman as the test sequence. From this pi chart, we can find ME takes 54.4% (ΦME), Bit Plane VLD for FGS takes 30.2% (ΦFGS_VLD), DCT/IDCT take 13.6% (ΦDCT/IDCT), MC takes 0.3% (ΦMC), Q/IQ take 0.1% (ΦQ/IQ), VLC/VLD for base layer take 1.0% (ΦBASE_VLC/VLD), and the others take the remaining parts (Φothers).

To convert the arithmetic instruction cycles into workload percentages, the following relationship in eqn. (31) is used to build Table 12, where ΦTypeN is the fraction of the computation time for each module in Type N. Table 12 shows the complexity ratio (CR) for the six architectures compared with Type I.

TypeN

TypeN TypeI

TypeI

IC

Φ = IC ⋅Φ (31)

For illustration, Type VI which is proposed in this thesis takes only 35.66% of computational power compared to Type I. From Table 10, Type VI saves 1 ME (54.35%), 3 IDCT plus 1DCT (13.61%), and 2 spatial-domain MC (0.29%), but needs another extra MC-DCT. According to eqn. (31), the revised fraction for MC amounts to 0.29% × 1066 / 128 = 2.42% of the overall complexity, where ICTypeVI = 1066 (1 MC-DCT) and ICTypeI

= 2 × 64 (2 pixel-domain MC). Since no instructions are needed for ME and DCT/IDCT, the new workload percentage for these modules is 0%. Therefore, the complexity ratio of Type VI is 0% (ΦME) + 0% (ΦDCT/IDCT) + 2.42% (ΦMC) + 31.75% (ΦFGS_VLD + ΦQ/IQ + ΦBASE_VLC/VLD + Φothers) = 34.17%. The other derivations for Type II to V are similar and shown in Table 12. The workload reduction is also represented as pi chart shown in Fig.

36 for the estimated complexity analysis for the six architectures in arithmetic operation levels.

Table 12. Arithmetic complexity ratio for the six transcoding architectures compared to the DEC-ENC architecture.

Type I Type II Type III Architecture

DEC-ENC CPDT CDDT

workload(%) operations workload(%) operations workload(%) operations

ME 54.35 — 0 0 0 0

DCT/IDCT 13.61 3664 13.61 3664 0 0

MC 0.29 128 0.29 128 4.83 2132

Others 31.75 — 31.75 — 31.75 —

Total 100 45.65 36.58

Type IV Type V Type VI

Architecture

SDDT Work [24] Proposed

workload(%) operations workload(%) operations workload(%) operations

ME 0 0 0 0 0 0

DCT/IDCT 0 0 6.36 1712 0 0

MC 2.42 1066 0.15 64 2.42 1066

Others 31.75 — 31.75 — 31.75 —

Total 34.17 38.26 34.17

54.35%

30.18%

Fig. 36. Estimated operational complexity comparison of the six transcoding architectures for Foreman

Chapter 6 Conclusion

In this thesis, we proposed a FGS multi-layer to MPEG-1/2/4 single-layer transcoding framework using multi-layers transcoding techniques with R-D optimization.

This proposed framework is constructed based on the SDDT architecture which is considered as one of the most computationally efficient transcoding architectures. To resolve the drift propagation problem raised by SDDT architecture in heterogeneous transcoding, two transcoding techniques, multi-layer transcoding and rate-distortion optimized universal model are developed to improve it. The multi-layer transcoding technique provides heterogeneous drift error compensation via transmitting additional enhancement layer. The rate-distortion optimized universal model is used to achieve a balance between coding efficiency and transmission bit rate under limited channel bandwidth. The proposed framework could efficiently transcode the FGS to MPEG-1/2/4 bitstream in a shared architecture and achieves a better transcoding complexity and transcoding quality balancing than conventional architectures.

The experimental results showed the proposed MSDDT architecture can provide a

The experimental results showed the proposed MSDDT architecture can provide a

相關文件