Chapter 2: Previous Work
2.2 Side Information Generation
Side information quality directly affects coding efficiency in a DVC codec, therefore, a lot of research efforts in DVC are devoted to side information generation algorithms. In [17], some experiments are conducted to show the relation between PSNR of side information (compared with the original W-Z frame), and number of W-Z bits requested. In summary, when PSNR of side information is higher, fewer W-Z bits are needed and the compression ratio is higher.
2.2.1 Interpolation and Symmetric Motion Model
Side information can be interpolated or extrapolated from key frames and previously reconstructed WZ-frames. This is similar to motion-compensated frame interpolation which is used to increase frame rate at decoder side. There are many models being proposed for motion-compensated interpolation. For example, Liu et al.
[29] assume that every frame has same motion field. This model of course is too simple so that the side information generated are not good. The symmetric motion model is a simple model but its performance is acceptable for some cases. Many researchers adopt symmetric motion model and use processing to enhance its performance [7][18].
The algorithm of the symmetric motion model is explained as follows. If the W-Z frame is frame Y, and its previous key frame is frame X, next key frame is frame Z. Symmetric motion model assumes that the object moves at a constant speed. So if the position of object O is (x1, y1) in frame X and (x1+mvx, y1+mvy) in frame Z, then the position of object O should be (x1+mvx/2, y1+mvy/2). When the decoder wants to generate side information of frame Y, it has already received frame X and Z. So it can perform traditional motion estimation on frame X and Z. For macroblock M at (x1, y1) in frame Z, there is a best matched macroblock N at (x1+mvx, y1+mvy) in frame X.
Then the decoder will project the average of M and N to (x1+mvx/2, y1+mvy/2) onto frame Y along the motion path of M and N.
Of course there are many problems when using this model to implement side information generator. First of all, there may be some macroblocks which do not satisfy this single-object constant motion model. A macroblock may contain two or more objects, or objects which do not move at a constant speed. Secondly, when the decoder projects every matched macroblock pair to W-Z frame, there may be some
pixels which have no projection, or twice or more projections. For pixels that are projected twice or more, it is not trivial to choose the best projection pair without knowledge of the original W-Z frame. For pixels that have no projections, the decoder must use some algorithms to interpolate their values as well.
Klomp et al. [5] believe that using motion estimation with more accuracy may increase side information quality. However, due to motion model mismatch, the actual improvements could be very little. Li and Delp [20] conducted some experiments on the amount of macroblocks in key frames used to generate side information. They discovered that the more macroblocks used, the better side information is generated, especially when these macroblocks come from different key frames.
In the above two proposals, decoders must do more computations to either increase motion estimation accuracy or to increase the number of reference macroblocks used. Ascenso et al. in DISCOVER team [18] adopt a different approach and develop a pixel domain DVC codec IST-PDWZ and a transform domain DVC codec IST-TDWZ. Their proposed technique is also based on the framework from Stanford with symmetric motion model. But before performing motion estimation on key frames, a low pass filter is used to make the estimated motion vectors less noisy.
After obtaining motion vectors, if macroblock Y would be the average of macroblock X and Z. Bi-directional motion estimation is then used to fine-tune the motion vectors.
After that, a smoothing filter is applied to the motion vectors because the motion field should be smooth. We can see the R-D performance in 0, this adjustment indeed makes performance better.
Figure 2. R-D performance of the DVC scheme in [18]
2.2.2 Hash Information as Motion Cue
In previous section we see how to use symmetric motion model to utilize key frames to generate good side information. We know symmetric motion model is limited. To further increase the quality of side information, it is possible for the encoder to compute and transmit some extra information to assist the decoder to generate better side information. Some researchers refer to this information as hash [13]. When decoder wants to generate side information, it does not perform motion estimation on neighboring key frames. Instead, it can obtain motion vector by directly comparing hashes of macroblocks. When hashes of two macroblocks are similar, the content of these two macroblocks are expected to be similar too.
Traditional hashes, for example, CRC and MD5, are very sensitive to content changing. But hashes used to compare image similarity should only be sensitive to perceptual changes in the pixel data. This kind of hash is called media hash, robust hash, soft hash, and image fingerprinting. Distance between hashes can be used as a measure of content similarity. If the hash is good enough, that is, a decent
representation of macroblocks, motion estimation by comparing hash shall give us a good result.
Media hash design is an active research topic, and many papers have already been published. But most of these papers design media hash for content-based retrieval, watermarking, image authentication, and image database management.
Media hash can be implemented by calculating the histogram of image [34], or the position of edges [35] or feature points [36]. DCT sign information is also useful [37].
The researchers hope that the media hash can be robust to geometric transformation and compression. So calculations of some media hashes are sometimes too complicated, and they are not suitable for DVC. Furthermore, these applications do not concern about the size of hashes, but for DVC, hash size affects bitrate.
For DVC, simple and small hashes are considered. In [13], using hash to help decoder to generate side information is first mentioned. The authors use sub-sampled and coarsely quantized version of macroblock in W-Z frame as their hash. To decrease overhead of hash, when co-located macroblock in previous key frame is similar to macroblock in current W-Z frame, no-hash bit is sent instead.
Girod et al. [8] also mentioned that the possibility of using high frequency portion of a macroblock as a distinct feature of the block. For an 8-by-8 block, 54 most high frequency coefficients (most of them may be zeros) with run-length and Huffman compression are sent to decoder as hashes. The remaining 10 low frequency coefficients are channel encoded and decoded. Although this idea seems reasonable, but in practice, it is not trivial for decoder to perform motion estimation based on high frequency coefficients.