Chapter 2 Error Resilience and Intra Refreshment for MPEG-4 Video Coding
2.5 Remarks
We have described existing works on constructing a streaming video system with error resilience techniques. To get better quality when packets are lost, error concealment algorithms and error robustness schemes are employed. The streaming system is integrated into a multimedia test bed that can simulate a variety of network conditions and evaluate error resilience performance for on-line testing.
Chapter 3 Rate-Distortion Optimized Intra Refreshment
The proposed Rate-Distortion Intra Refreshment (RDIR) framework is realized based on MPEG-4 reference encoder. Within the RDIR framework, we will introduce the concept and rationale of intra refreshment tools including the R-D cost, the historical record, the systematic insertion of intra MBs and the run length of successive intra MBs.
3.1 Encoder with Intra Refreshment
Fig. 18 shows the whole architecture of MPEG-4 simple profile encoder with intra refreshment mechanism RDIR, which considers the mode decision of motion estimation (ME).
The module of mode modification will collect information from R-D cost, history record, and intra run to choose a better mode in transmitting video content over the channels with packet loss. The R-D cost module calculates the Lagragian cost of inter and intra mode based on the information from two more encoding processes, which mean inter coding and intra coding.
The derivation of the R-D cost is based on real bits and distortion. Two other coding processes that are associated with intra and inter modes for the whole frame are executed before the R-D cost calculation. During the third coding process, conditions about history record and intra run will also be checked. History record will help stop the long reference chain and the deterioration of error drift. Intra run will help distribute the overhead to different frames in the whole sequence, which can improve concealment efficiency since the intra MBs contain no motion information that is used in the temporal concealment process.
Current
Fig. 18. Architecture of encoder with intra refreshment
3.2 Mode Decision for Intra Refreshment
Considering packet loss during transmission, our coding system makes intra refreshment mode decision based on the error location. As shown in Table 1, there are four conditions when encoding each pair of frames. To have a better mode decision by inserting a proper number of intra MBs to the bitstream of the processing frame, we simply the mode decision by assuming that only the previous frame has packet lost MBs and the current frame has no packet loss. Under the assumption, both types of real distortion including the quantization error and the concealed error from the previous frame are used for refining the mode decision for vide delivery over the packet-switching network.
Table 1. The four conditions for the mode decision of intra refreshment.
Current
frame Previous
frame
Packet loss No packet loss
Packet loss X V
No packet loss X X
3.3 Algorithm Descriptions
The intra refreshment technique is to insert intra coding block instead of inter coding block in P frames to prevent serious error propagation caused by packet loss due to transmission over error-prone network. Since the intra coding block uses more bits, the coding efficiency is poor when the packet loss rates decrease. To balance the coding efficiency and error robustness, intra block insertion with rate-distortion optimization adaptive to channel conditions can reduce the total amount of overhead and retain the coding performance at the same time.
The RDIR can enhance the robustness of the encoding bitstreams based on the following approaches, which indicate the use of history record, the inclusion of motion and texture information for record update, the coding bitrate of previous frame, and the successive number of intra MBs.
3.3.1 Rate-Distortion Model
Rate-distortion (R-D) model has been proven to effectively fit mode decision in coding system for error-free channel [33] In error-prone environments, the mode selection based on an R-D framework to convey packet loss has been proposed in [25] .
We add a R-D model in our coding system that derives the distortion with both concealment error and quantization error. Inter and intra modes are used in our mode decision framework. The intra mode represents coding without temporal prediction and the inter mode
uses motion compensated temporal prediction. For each MB, we collect the information about the coding bits and quantization distortion as for inter and intra coding individually. For each MB, we calculate the cost for intra and inter blocks, which are indicated as and respectively, by a Lagrangian formula.
Jintra Jinter
R D
J = q +λ⋅ .
Where indicates the Lagrangian cost. For each block, the cost = is for intra coding mode and = is for inter coding mode.
J Jintra J
Jinter J λ is the parameter used to control coding bit rate in encoding process. means distortion induced from residual quantization and
Dq
R presents bits used in coding a MB.
After the cost J is decided, the mode with the minimal value of J is chosen as the current MB coding mode. For error-prone environment, the distortion is increased by lost packets.
When any error occurs, temporal prediction will allow the error to propagate if the inter mode is chosen. Using the intra coding mode will stop error propagation and decrease the coding efficiency simultaneously. To achieve the R-D optimization under the proposed intra refreshment encoding, the parameter of
D
λ needs to be updated every frame to control the used bits under the same distortion by
⎟⎟⎠
Where the parameter α is decided with massive simulations of buffer control. The packet loss rate is used to model the Internet protocol. Using network conditions to model the situation in decoder end is expected to reconstruct better image quality. As the modeling is 100% matched, we will get the same quality as transmitted one in error prone environment.
In our proposed RDIR, for each MB, we compute the cost for intra and inter blocks, and by the Lagrangian formula
Jintra
Jinter J =D +λ⋅R where J is the Lagrangian cost. λ is the parameter used to control coding bit rate in encoding process. The means the distortion induced from residue quantization and
D
R equals to the number of bits used in coding a MB. We take the different for each mode. For the inter mode, we have to take concealment distortion due to a concealment block with variable pack-loss rate by
D
p D p D
D= q ×(1− )+ c× ,
where is the quantization and is the concealment error from previous frame; for intra mode, there is something difference in the distortion,
Dq
Dc
Dq
D= . Since intra mode doesn’t refer to the previous frame and concealment error is out of consideration. The parameter p is the estimated channel packet loss rate, which is 10 % here.
3.3.1.1 Use of Error Concealed Data
The concealment error will be calculated when we get the motion information. We will reconstruct one ‘concealed’ frame by assuming one MB lost each time and doing frame reconstruction with the same concealment strategy in decoder end. The concealment error will be derived by comparing the ‘concealed’ frame with the original reconstructed frame.
After the cost of J is decided, the mode with minimal value of J is chosen as the current MB coding mode. For error prone environment, the distortion of will suffer more serious quality loss. The quality loss comes from the original quantization error and the errors introduced when concealing the lost MB from nearby MBs. To consider the error concealed distortion, the metric J is quantified by
D
( )
(
D p D p)
RJ = q ⋅1− + c⋅ +λ⋅ .
Where presents the distortion induced from residual quantization and is distortion induced from a specified concealment algorithm.
Dq Dc
p indicates the packet loss rate.
3.3.2 Historical Record
The motion information is used for selecting better coding modes after R-D cost calculation.
As in [38] , pixel level inter-frame dependence history has been adopted to enhance the intra refreshment. We take some record about reference times as another intra refreshment criterion for the MBs that are assigned as inter-mode by R-D decision.
Initially, we set the value of all MB’s reference history chains by 0. The setting is starting
from the 1st P-frame and the record values will be updated for each subsequent frame under two major considerations, which include the weighting summation of the record values of previous frame and the use of current motion and coding information. The update is continued until the last coding frame.
The record will be updated while encoding each frame. If the mode of MB is intra, corresponding record will be refreshed to 0; otherwise, we will use motion vector to count the record with average weighting in the reference frame by
4
where A1-A4 are areas of each MB in the previous frame which are referenced by current MB in the current frame, and HR1-HR4 are records of each referenced MB. Intra mode will be used if the record has shown that current MB is in a long reference chain.
We take the moving-intensive area as a specific condition that we shall refresh more frequently because there will be more severe concealment distortion when the area is lost. The ratio of motion vector amplitude (mv) over the specified search range (SR) will be added into the calculation of current MB’s history record.
SR record of each MB is larger than a specified threshold and the current MB is set as inter coded by the mode decision module, the processing MB is changed to intra coded. The threshold for the better performance of intra refreshment is found empirically.
A1 A2
Fig. 19. History record update.
3.3.3 Uniform Allocation of Overhead Bits
The bits used for coding previous frames are considered to spread the overhead bits of intra refreshed MBs more uniformly for the handling sequence. To alleviate the increase of the overall bit budget, we multiply the history record value of the current frame with a fractional number when the used bits exceed a specified threshold. The fractional value that is set as a floating number less than the unity is to control the number of spending bits. In our simulations, the record values are recalculated by
⎩⎨
where HRn indicates the original history record and H ′Rn is the reduced history record.
means the total bits used for coding previous (n-1) frames. is the target bit rate in bits/sec and
−1
Rn RT
∆ is used to control the increase of overhead bits from the insertion of more R intra MBs. The floating numberα is set as 0.9 in our simulations.
3.3.4 Uniform Location of Successive Intra MBs
For streaming over the Internet and wireless channels, the video bitstreams are packed into
packets. The packet loss may occur when transmitting via the channels, which may drop successive intra MBs. To avoid quality degradation of received video by the burst packet loss, when successive number of intra MBs are detected, the intra refreshment will not change the inter MBs to the intra MBs. The record value is retained for the following update. It will help distribute the overall overhead to different frames and improve error concealment. The successive intra MBs that contain no motion information will degrade the error concealment performance especially in temporal concealment.
3.3.5 Application to MPEG-4 Video Reference Software
In RDIR, we incorporate the RD-framework, concealment error, and the historical record into the mode decision process of the coding standard. In addition, we consider packet loss rate for deriving the distortion of the inter coding MBs. R-D framework with concealment error will be more efficient in mode decision. We try to stop the error propagation caused by packet loss since the concealment error is usually larger than the quantization distortion. In enhancing the quality of decoded video, historical record is recommended. The long series of P-frame will be stopped once the historical record is larger than the threshold. As in Fig. 20, the RDIR will be performed in each MB per frame. The MB based RDIR can fit to the video content delivery over an error-prone transmission channels.
Begin of i-th P
Intra coding Record > threshold
Go to next P
Intra coding Record > threshold
Go to next P
Fig. 20. The flow chart of RDIR encoding
3.4 Summary
RDIR inserts intra MBs based on video characteristics, properties of packet-switched streaming schemes and network conditions, which can further improves the visual quality and balances the coding efficiency with error robustness of bitstream.
Chapter 4 Experimental Results
Experimental results using the proposed RDIR algorithm under various bit rates and different network conditions are provided. We will perform both off-line and on-line testing.
As for off-line testing, a decoding process with the hybrid concealment algorithms is used to investigate the overall performance of an error resilience system. For on-line testing, retransmission scheme in the steaming system is taken into consideration for providing the better quality of service.
4.1 Off-line Testing
4.1.1 Experimental environment
We use MPEG-4 Simple Profile reference software of MoMuSys version to realize the RDIR algorithm. We analyze the effect of the RDIR algorithm with different test sequences of slow motion, fast motion, and D1 (720x480) resolution. For simulations, the encoding frame rate is 30 frames per second (fps), decoding frame rate is 10 ftp, GOP structure is I-P-P…, and packet size is 1000 bits. To off-line simulate packet loss conditions and observe the impact of packet loss rates on the visual quality of reconstructed video sequences, we randomly drop the MBs to simulate different packet loss rates. The dropping probability is in a uniform distribution. We average ten simulation results to get the average performance because different lost packets in the bitstream will cause different quality degradation.
4.1.2 RDIR performance
We evaluate the RDIR performance with different tools and compare with the encoding schemes without intra refreshment. The test sequence is Foreman with CIF (352x288) resolution under 10% packet loss rate. The concealment algorithm adopted in the decoding loop of the encoder is spatial copy in I-VOPs and zero motion in P-VOPs.
Table 2. PSNR improvement of different tools at different bitrate
Error Free NoIntra RD RD+HR RD+HR+MV Total gain
128k 30.19 19.25 21.79 24.47 24.78
Gain -- 2.54 2.68 0.31 5.53
256k 33.58 19.16 22.26 25.43 25.63
Gain -- 3.1 3.17 0.2 6.47
512k 36.82 19.03 22.44 26.02 26.20
Gain -- 3.41 3.58 0.18 7.17
The column ‘Error Free’ shows the original bitstream without intra refreshment under error-free environment and the column ‘NoIntra’ presents the original bitstream transmitted over the channels with packet loss. The next three columns indicate the RDIR algorithms that include the different additional tools are used for simulations. As for different bitrates, the gain as compared with the ‘NoIntra’ condition is about 5.53 ~ 7.17 dB. The visual quality of the 61st frame with different tools at 128k bits per second (bps) is shown in Fig. 21.
NoIntra
RD
RD+HR
RD + HR + mv
Fig. 21. Visual quality improvement of RDIR
As for slow motion video sequences, Akiyo sequence is used for simulations at different bitrates and packet loss rates. We can get 0.88~6.32 dB gain in PSNR.
Table 3. PSNR improvement of slow motion simulation Akiyo
bitrate(bits/sec) 50k 100k 200k
Proposed NoIntra Gain Proposed NoIntra Gain Proposed NoIntra Gain
Error Free 37.63 -- -- 40.34 -- -- 42.76 -- --
PLR 1% 36.05 35.17 0.88 37.22 36.15 1.07 38.96 36.85 2.11
PLR 5% 31.75 27.16 4.59 33.30 29.82 3.48 33.60 29.18 6.32
PLR 10% 28.61 23.15 5.46 29.52 24.22 5.3 30.87 24.88 5.99
As for fast motion video sequences, Foreman sequence is used for simulations at different bitrates and packet loss rates. The RDIR has improved the PSNR of the ‘No intra’ scheme by 1.21~6.95 dB gain.
Table 4. PSNR improvement of fast motion simulation Foreman
bitrate(bits/sec) 128k 256k 512k
Proposed NoIntra Gain Proposed NoIntra Gain Proposed NoIntra Gain
Error Free 30.19 -- -- 33.58 -- -- 36.82 -- --
PLR 1% 28.26 27.05 1.21 31.00 27.95 3.05 33.55 28.87 4.68 PLR 5% 26.57 20.84 5.73 28.10 22.35 5.75 28.98 22.84 6.14 PLR 10% 25.01 19.10 4.91 25.77 19.16 6.61 26.37 19.42 6.95
As for D1 resolution video sequences, Crew sequence is used for simulations at different bitrates and packet loss rates. The results show that the RDIR can get 1.65~4.55 dB gain in PSNR.
Table 5. PSNR improvement of D1 resolution simulation Crew
bitrate(bits/sec) 600k 900k 1200k
Proposed NoIntra Gain Proposed NoIntra Gain Proposed NoIntra Gain
Error Free 32.11 – – 34.40 – – 35.79 – –
PLR 1% 31.21 29.56 1.65 32.86 30.53 2.33 33.92 31.51 2.41 PLR 5% 28.70 25.31 3.39 29.62 25.38 4.24 29.82 25.47 4.35 PLR 10% 26.65 22.95 3.60 26.88 22.96 3.92 27.34 22.79 4.55
4.1.3 Combination with hybrid concealment
To evaluate the performance of the error resilience system, 3 series of experiments are adopted. For VOD applications under various network conditions, the error resilient capabilities are examined at 3 bitrates including 256, 550 and 700 kilo-bits per second (kbps).
In each experiment, 4 error resilience systems as listed in Table 6 are compared. Type 1 represents the original reference encoder and decoder system with default error resilience tools including Resync_Markers and zero-motion-vector spatial copy. Type 2 enables RDIR in the encoder. Type 3 enables the error robustness and hybrid concealment algorithm. Type 4 is the error system with RDIR in the encoder and hybrid concealment in the decoder. 3 different network conditions including packet loss rate (PLR) of 1%, 5% and 10% with uniform dropping probability model are used for testing. The test sequence is encoded with one I-VOP’s and 99 P-VOP’s.
The results are shown in Table 7 to Table 12. We can find that the Type 2 can get a gain over Type 1 with 3.64-8.45 dB in PSNR. Type 3 can achieve a gain over Type 1 with 5.16-10.62 dB in PSNR. Based on error resilient decoder and rate-distortion intra refresh encoder, we can get a gain over Type 1 with 6.33-13.22 dB in PSNR. Fig. 22 to Fig. 27 demonstrates rate-distortion (R-D) curves to compare performance of the four error resilience systems with the different packet loss rates (PLR). Fig. 28 shows that Type 4 has the quality
improvement and the objective quality over the other 3 types.
Table 6. The 4 types of system with embedded error resilience tools
Type Encoder Decoder
1 Resynchronization marker Zero motion for P-VOPs and spatial copy for I-VOPs 2 Intra-refresh Zero motion for P-VOPs and spatial copy for I-VOPs 3 Resynchronization marker Proposed hybrid concealment
4 Intra-refresh Proposed hybrid concealment
Table 7.The reconstructed image quality for Foreman with 260 kbps
Type 1 Type 2 Type 3 Type 4
PSNR3 33.59 - - -
PLR2
Gain1 - - - -
PSNR 28.26 30.22 30.92 31.70 1%
Gain1 0 1.96 2.66 3.44
PSNR 21.80 26.69 26.60 28.65 5%
Gain1 0 4.89 4.8 6.85
PSNR 19.03 23.67 23.78 25.78 10%
Gain1 0 4.64 4.75 6.75
1Gain: the difference compared to PSNR of type 1 (unit:dB)
2PLR : packet loss rate
3unit: dB
Table 8.The reconstructed image quality for Foreman with 550 kbps
PLR2 Type 1 Type 2 Type 3 Type 4
PSNR3 36.94 - - -
0%
Gain1 - - - -
PSNR 28.87 32.78 31.90 34.55 1%
Gain1 0 3.91 3.03 5.68 PSNR 22.00 28.00 26.94 30.75 5%
Gain1 0 6.00 4.94 8.75 PSNR 19.64 26.32 24.55 28.75 10%
Gain1 0 6.68 4.91 9.23
Table 9.The reconstructed image quality for Foreman with 700 kbps
PLR2 Type 1 Type 2 Type 3 Type 4
PSNR3 38.04 - - -
0%
Gain1 - - - -
PSNR 30.10 34.29 34.03 35.70 1%
Gain1 0 4.19 3.93 5.6 PSNR 21.60 28.46 27.05 31.63 5%
Gain1 0 6.86 5.45 10.03 PSNR 19.41 26.74 25.23 29.54 10%
Gain1 0 7.33 5.82 10.13
Table 10.The reconstructed image quality for Akiyo with 130 kbps
PLR2 Type 1 Type 2 Type 3 Type 4
PSNR3 41.2 - - -
0%
Gain1 - - - -
PSNR 22.57 39.83 33.86 39.95 1%
Gain1 0 17.26 11.29 17.38
PSNR 21.68 37.5 30.18 39.21 5%
Gain1 0 15.82 8.5 17.53
PSNR 20.62 35.55 28.03 36.6 10%
Gain1 0 14.93 7.41 15.98
Table 11.The reconstructed image quality for Akiyo with 200 kbps
PLR2 Type 1 Type 2 Type 3 Type 4
PSNR3 43.15 - - -
0%
Gain1 - - - -
PSNR 22.61 41.34 33.93 42.1 1%
Gain1 0 18.73 11.32 19.49
PSNR 21.7 38.68 30.12 40.13 5%
Gain1 0 16.98 8.42 18.43
PSNR 20.63 36.28 28.78 37.38 10%
Gain1 0 15.65 8.15 16.75
Table 12.The reconstructed image quality for Akiyo with 310 kbps
PLR2 Type 1 Type 2 Type 3 Type 4
PSNR3 44.89 - - -
0%
Gain1 - - - -
PSNR 22.55 39.83 34.48 43.74 1%
Gain1 0 17.28 11.93 21.19
PSNR 21.72 37.5 31.02 40.59 5%
Gain1 0 15.78 9.3 18.87
PSNR 20.64 35.55 29.26 37.75 10%
Gain1 0 14.91 8.62 17.11
15 20 25 30 35 40
0 100 200 300 400 500 600 bitrate
PSNR
Type2_PLR10 Type3_PLR10 Type4_PLR10 Type1_PLR10
Fig. 22. R-D curve of reconstructed image quality for Akiyo sequence with PLR 10%
20 25 30 35 40 45
0 100 200 300 400 500 600 bitrate
PSNR
Type2_PLR5 Type3_PLR5 Type4_PLR5 Type1_PLR5
Fig. 23. R-D curve of reconstructed image quality for Akiyo sequence with PLR 5%
20 25 30 35 40 45
0 100 200 300 400 500 bitrate
PSNR
Type1_PLR1 Type2_PLR1 Type3_PLR1 Type4_PLR1
Fig. 24. R-D curve of reconstructed image quality for Akiyo sequence with PLR 1%
15 17 19 21 23 25 27 29 31
0 200 400 600 800 1000 bitrate
PSNR
Type1_PLR10 Type3_PLR10 Type2_PLR10 Type4_PLR10
Fig. 25. R-D curve of reconstructed image quality for Foreman sequence with PLR 10%
20 22 24 26 28 30 32 34
0 100 200 300 400 500 600 700 800 bitrate
PSNR
Type1_PLR5 Type3_PLR5 Type2_PLR5 Type4_PLR5
Fig. 26. R-D curve of reconstructed image quality for Foreman with PLR 5%
25 27 29 31 33 35 37
0 200 400 600 800 bitrate
PSNR
Type1_PLR1 Type2_PLR1 Type3_PLR1 Type4_PLR1
Fig. 27. R-D curve of reconstructed image quality for Foreman with PLR 1%
Fig. 22 to Fig. 27 show a tendency that the PSNR values at high bitrate are lower than the
Fig. 22 to Fig. 27 show a tendency that the PSNR values at high bitrate are lower than the