III. FEC-DISTORTION OPTIMIZATION ALGORITHMS
3.4 D YNAMIC FEC-D ISTORTION O PTIMIZATION A LGORITHM
3.4.1 Classifying Pictures to Clusters
Figure 3-8 is an example of picture classification, where pictures are classified by their temporal level. As mentioned in Section 2.2, when hierarchical B prediction is applied, each video layer can be decomposed to a number of temporal levels and every picture is predicted by the reconstructed pictures of other two pictures of lower temporal levels. Therefore we can say that the pictures belong to lower temporal levels are more important than those of higher temporal levels.
In the implementation phase, a cluster may have unused spaces after all pictures of cluster are filled inside. When this happens, we append the pictures of the next cluster to the current one until there are no free spaces remained.
0
Fig. 3-8: An example of picture classification
3.4.2 Protecting Scheme
Figure 3-9 is an example of DFDO protecting scheme, blue parts are the clusters of video data and green parts are the FEC data used to protect source clusters enclosed within brackets.
Ki is the number of packets in cluster i, and Mi denotes the number of protection packets in
cluster (0,1,…,i). The summation of Mi must be the same as the number of packets allocated
to this layer by Flat FEC-Distortion Optimization algorithm.
0
Fig 3-9: An example of DFDO protecting scheme
3.4.3 Algorithm
The main idea of Dynamic FEC-Distortion Optimization algorithm is to find the best allocating patterns of protecting symbols such that the distortion of the whole video layer is minimized (or the PSNR of the whole video layer is maximized), and can be illustrated as equation (12). An allocating pattern is a vector where each element determines the number of protecting symbols for corresponding cluster. M* is the set of all possible patterns. PSNR*(s,p) is the corresponding PSNR when applying allocating pattern s to a video layer with average packet loss rate p. The PSNR of whole video layer can be estimated as the summation of PSNR contributed by each cluster and is illustrated as equation (13), where p*i means the decodable probability of the only first i clusters and D*i is the corresponding PSNR contribution.
(s,p) S " arg max PSNR *
* s∈M
= (12)
∑
=⋅
=
Ci
i
i
D
p (s,p)
PSNR
0
*
*
*(13)
Figure 3-10 is a binary tree for 3-layer video. Each path which starts from root and ends at leaf forms a decoding path. A Y-edge denotes that the corresponding source cluster is decodable, and an N-edge means that the corresponding source cluster is undecodable. For example, Path 0Y1N2Y denotes the event that cluster 0 is decodable followed by cluster 1, which is undecodable, and further followed by decodable cluster 2. Since cluster (0,1,…,i) protects source clusters 0 to i at the same time, all source clusters can be decoded at the end of the path 0Y1N2Y.
The decodable probability of the only first i clusters, p*i, can be derived by such binary trees. In figure 3-10, p*3 is the sum of the probability of decoding paths 0Y1Y2Y, 0Y1N2Y, 0N1Y2Y, and 0N1N2Y. And for p*2, its value is the sum of the probability of decoding paths 0Y1Y2N and 0N1Y2N. And so on.
Fig 3-10: A binary tree for 3-layer video
We summarize p*i in equation forms, as illustrated in (14) to (17). Equation (15) corresponds to the decodable probability of paths which start from ID-Y and there are pvK packets received from source clusters 0 to ID-1. Equation (16) corresponds to the undecodable probability of paths which start from ID-N and there are pvK packets received from source clusters 0 to ID-1. Equation (17) calculates the minimal number of packets required to decode source clusters 0 to ID. C is the number of source clusters in layer, p is the average lost rate, Ki is the number of source symbols in cluster i and Mi is the number of protecting symbols of protecting cluster (0,1,…,i).
) 0 , , 0 ( ) 0 , , 0
*
(
i N i
Y
p
i= + (14)
( )
The distortion measuring function D*i, which is corresponded to p*i, can be estimated as equation (18), where PSNRj is the summation of PSNR contributed by pictures in cluster j.
The β is a leaky factor to reflect the quality degradation caused by reference to an unavailable and error concealed picture. The exponential form of β is due to the hierarchical B prediction.
1
The value of β is chosen experimentally. We protect the video sequences “ice” and
“city” with α 0.15 and different βs and select the best one which shows the highest PSNR.
Each protected stream passes 100 lossy channels which have the same packet loss rate 0.285714 but different random seeds to generate lost pattern. The encoding parameters and
simulation results of sequence “ice” and “city”, respectively. From figures 3-11a and 3-11b, we can observe that DFDO has stable and best performance when β is between 0.6 to 0.95.
Thus, we choose 0.75 in further simulations.
Table 3-5: Encoding parameters and PSNRs
Layer ID Resolution Bit-rate (kbit/sec) PSNRice (dB) PSNRcity (dB)
0 QCIF 200 30.1402 25.0721
1 QCIF 400 30.5005 25.2716
2 CIF 600 32.2488 26.7457
3 CIF 800 32.6895 27.2123
4 4CIF 1000 36.3041 30.9114
Fig. 3-11a: Beta-PSNR curve for video sequence “ice”
Fig. 3-11a: Beta-PSNR curve for video sequence “city”
IV. Simulation Results
4.1 Simulations for Flat FEC-Distortion Optimization
4.1.1 Environment
We prepare two 300-frame video sequences, mobile and crew, which are encoded with the same parameters listed in table 4-1. For each video sequence, we simulate with four packet loss rates which are 0.12, 0.24, 0.36 and 0.48. The packet lost patterns are generated with independent and identical distribution. The bandwidth is given as shown in figure 4-1, where for odd-ID GOPs we give more bandwidth and for even ones we give less. Theα factor used in FFDO is 0.15, as described in Section 3.3.2. Every simulation value is averaged from 100 times of simulations with different loss patterns. The protection result for both videos with four packet loss rates are listed in table 4-2, and the graphs of simulation results are illustrated in figure 4-2 and 4-3.
Table 4-1: Encoding parameters and PSNRs
Layer ID Resolution Bit-rate (kbit/sec) PSNRmobile PSNRcrew
0 QCIF 200 21.5101 29.5959
1 QCIF 400 21.9109 30.3443
2 CIF 600 28.0416 31.5937
3 CIF 800 29.6475 32.1536
4 4CIF 1000 30.7404 33.4996
Fig 4-1: Given bandwidth
4.1.2 Simulation Results
Table 4-2: Protection results (The number is the amount of protecting symbols)
mobile crew Packet loss rate Packet loss rate
GOP Layer 0.12 0.24 0.36 0.48 0.12 0.24 0.36 0.48
0 8 16 18 28 9 16 20 35
0 8 15 17 28 9 17 20 34
1 7 14 13 13 6 14 13 25
2 7 13 12 0 6 13 10 X
17
3 5 X X X 5 X X X
0 5 16 22 40 7 15 24 39
1 4 11 4 X 3 10 0 X 18
2 3 X X X 0 X X X
0 9 17 19 32 7 14 22 39
1 7 15 16 27 5 10 17 29
2 7 15 16 26 4 9 15 X
3 6 11 8 X 3 5 X X 19
4 4 X X X 3 X X X
Fig. 4-2: Simulation results of sequence mobile with different packet loss rates
Fig 4-3: Simulation results of sequence crew with different packet loss rates
4.2 Simulations for Dynamic FEC-Distortion Optimization
4.2.1 Environment
We first encode two sequences, “football” and “soccer”, both have frame rate 30 fps, to compare the protection performance between FFDO and DFDO. The encoding parameters are listed in table 4-3. The packet loss rate is 0.285714; the α factor used in FFDO is 0.15, and the β factor used in DFDO is 0.75. The bandwidth distributions for two videos are given in figure 4-4.
Fig 4-4a: Given bandwidth for football @ 30 fps
Fig 4-4b: Given bandwidth for soccer @ 30 fps
Table 4-3: Encoding parameters and PSNR for sequences football and soccer at 30fps Layer ID Resolution Bit-rate (kbit/sec) PSNRfootball (dB) PSNRsoccer (dB)
0 QCIF 200 27.5387 28.3235
1 QCIF 400 28.9634 28.9213
2 CIF 600 31.0809 30.0142
3 CIF 800 32.1219 30.4611
4 4CIF 1000 32.3232 32.1135
Figures 4-6a and 4-6b are the simulation results of football with FFDO and DFDO, respectively. For the convenience of comparison, we also present the graph of the PSNR difference between DFDO and FFDO, as shown in 4-6c, where the positive area indicates that DFDO outperforms FFDO by 0.4 dB in average. In fig 4-6d, we compare both algorithms with sequence soccer, and it shows that DFDO outperforms FFDO by 0.7 dB, averagely.
From figure 4-6c, we can observe that DFDO outperforms FFDO even more when a GOP has fewer video layers protected. This is because the percentage of the number of
observed from the protection result as shown in table 4-5.
Further, in order to examine the performance of DFDO when the video content has wide and global motion, we down-sample the sequences football and soccer from 30fps to 15fps.
The encoding parameters are listed in table 4-4, such as packet loss rate and both leaky factors used in simulation remain the same. The graph of given bandwidth is shown in figure 4-5. The simulation results for both videos are shown in figure 4-7a and 4-7b, and DFDO outperforms FFDO by 0.5~0.7 dB. In the case of football, it is almost twice as compared with the 30fps version.
Fig 4-5a: Given bandwidth for football @ 15 fps
Fig 4-5b: Given bandwidth for soccer @ 15 fps
Table 4-4: Encoding parameters and PSNR for sequences football and soccer at 30fps Layer ID Resolution Bit-rate (kbit/sec) PSNRfootball (dB) PSNRsoccer (dB)
0 QCIF 200 28.4286 28.6826
1 QCIF 400 29.9313 29.1768
2 CIF 600 32.7111 30.3753
3 CIF 800 34.2528 30.8541
4 4CIF 1000 34.4149 32.9559
We also compare both algorithms under the case of misestimate of packet loss rate.
Figure 4-8 shows the simulation results on the 30fps and 15fps football sequences, where the x axis means the actual packet loss rate in channel and the y axis means how much DFDO outperforms FFDO. The misestimated packet loss rate is 0.285714. And we can observe that, although the packet loss rate is not estimated accurately, our DFDO algorithm still outperforms FFDO about 0.25 to 0.75 dB.
4.2.2 Simulation Results
Table 4-5: Protection results (The number is the amount of protecting symbols) DFDO
GOP Layer FFDO
M0 M1 M2 M3 M4
0 15 0 0 0 0 15
1 12 0 0 0 0 12
2 10 0 0 0 0 10
10
3 5
0 0 0 5 0
0 14 0 0 0 0 14
1 8
0 0 0 7 1
112 5
0 0 0 5 0
0 14 0 0 0 0 14
12 1 4
0 0 4 0 0
13 0 4
0 0 4 0 0
0 14 0 0 0 0 14
14 1 4
0 0 4 0 0
0 13 0 0 0 0 13
1 9 0 0 0 0 9
15
2 4
0 0 4 0 0
0 15 0 0 0 0 15
1 11 0 0 0 0 11
2 11 0 0 0 0 11
16
3 5
0 0 0 5 0
0 18 0 18 X X X
1 16 0 16 X X X
2 15 0 15 X X X
3 12 0 12 X X X
17
4 5
5 0 X X X
Fig 4-6a: Simulation result of FFDO with football, average PSNR is 28.4065 dB
Fig 4-6b: Simulation result of DFDO with football, average PSNR is 28.8179 dB
Fig 4-6c: The difference of simulation results between DFDO and FFDO, football @ 30 fps
Fig 4-6d: The different of simulation results between DFDO and FFDO, soccer @ 30 fps
Fig 4-7a: Simulation result between DFDO and FFDO, football @ 15 fps
Fig 4-7b: Simulation result between DFDO and FFDO, soccer @15 fps
Fig 4-8: Simulation result with different packet loss rate, sequence football
Fig 4-9a: mobile Fig 4-9b: crew Fig 4-9c: harbour
Fig 4-9d: football Fig 4-9e: ice Fig 4-9f: soccer
V. Conclusion
In multimedia transmission, forward error correction code is a useful technique to avoid the retransmission of lost packet, which may lead to a large delay and is not feasible to real-time play. The erasure code protects data by adding some redundancies to the source information. And for a video stream transmitted over a lossy channel which has limited bandwidth, it is important to distribute bandwidth among video data and protection data efficiently in order to have good visual experience.
In this paper, we first modify [5] to be flat FEC-distortion optimization (FFDO) algorithm, which not only can adapt to the scalable video streams encoded with H.264/MPEG-4 AVC scalable extension standard but also can take account inter-layer prediction. Then we propose the dynamic FEC-distortion optimization (DFDO) algorithm, which further improves FFDO so that the pictures within the same video layer of a GOP are protected according to their importance. Thus, when a video layer of GOP can not be completely received, DFDO has more chance to recover important pictures than FFDO does.
The simulation results show that the average PSNR of DFDO outperforms FFDO about 0.4 dB. If we make the video content moves wider by down-sampling from 30fps to 15 fps, DFDO outperforms FFDO about 0.8 dB. We also navigate the performance of DFDO under the case of misestimate of packet loss rate, and the simulation results show that DFDO still outperforms FFDO. Thus we can conclude that protecting pictures within the same video layer according to their importance can improve the visual quality.
Reference
[1] ETSI, “Digital Video Broadcasting (DVB): Transmission systems for handheld terminals,” ETSI standard, EN 302 304 V1.1.1, 2004.
[2] J. Byers, M. Luby, and M. Mitzenmacher, “A Digital Fountain Approach to Asynchronous Reliable Multicast,” IEEE Journal on Selected Areas in Communications, 20(8), pp. 1528-1540, October 2002.
[3] M. Luby et al., “Raptor Codes for Reliable Download Delivery in Wireless Broadcast Systems,” IEEE CCNC, Las Vegas, NV, Jan. 2006.
[4] 3GPP TS 26.346 V6.4.0, “Technical Specification Group Services and System Aspects;
Multimedia Broadcast/Multicast Service (MBMS); Protocols and Codecs,” Mar. 2006.
[5] W.-T. Tan, A. Zakhor, “Video multicast using layered FEC and scalable compression,”
IEEE Transactions on Circuits and Systems for Video Technology, March 2001.
[6] H.-F. Hsiao, A. Chindapol, J. A. Ritcey, and J.-N. Hwang, “Adaptive FEC Scheme for Layered Multimedia Streaming over Wired/Wireless Channels,” Workshop on Multimedia Signal Processing, IEEE, pp. 1-4, Oct. 2005.
[7] S.-R. Kang and D. Loguinov, “Modeling Best-Effort and FEC Streaming of Scalable Video in Lossy Network Channels,” IEEE/ACM Trans. on Networking, Feb. 2007.
[8] W.-T, A. Zakhor, “Real-Time Internet Video Using Error Resilient Scalable Compression and TCP-Friendly Transport Protocol,” IEEE Transactions On Multimedia, Vol. 1, No. 2, JUNE 1999
[9] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the Scalable H.264/MPEG4-AVC Extension,” International Conference on Image Processing, IEEE, Oct., 2006.
[10] L. Rizzo, "Effective Erasure Codes for Reliable Computer Communication Protocols",
[11] ITU-T Rec. & ISO/IEC 14496-10 AVC, “Advanced Video Coding for Generic Audiovisual Services,” version 3, 2005.
[12] San Ling and Chaoling Xing, “Coding Theory, A First Course”, Cambridge University Press
[13] http://ip.hhi.de/imagecom_G1/savce/index.htm
[14] Yao Wang, Jorn Ostermann, and Ya-Qin Zhang, “Video Processing And Communications”, Prentice Hall, pp. 349
[15] S. R. Hankerson, D. G. Hoffman, D. A. Leonard, C. C. Lindner, K. T. Phelps, C. A.
Rodger, J. R. Wall, “Coding Theory And Cryptography, The Essentials, 2nd edition”, Marcel Dekker, Inc.
[16] Jerome Lecan and Jerome Fimes, “Systematic MDS Erasure Codes Based on Vandermonde Matrices”, IEEE Communications Letters, Vol. 8, No. 9, Sep. 2004
[17] I.Gohberg and V.Olshevsky, “Fast algorithms with preprocessing for matrix-vector multiplication problems”, Journal of Complexity, 1994