System Block Diagram - System Architecture

Chapter 4: System Architecture

4.1 System Block Diagram

The system block diagram of the proposed frequency domain DVC codec is shown in Figure 17. First, a sequence will be divided into key frames and WZ frames.

Odd frames are key frames and even frames are WZ frames. Key frames are encoded and decoded using H.264 main profile intra coder. The version of H.264 codec we use is JM 9.0. The QP values are determined based on quantization matrices used for WZ frames.

The decoder uses key frames it has received to generate side information of WZ frame. Then it will classify macroblocks into two groups, SA and SB, based on side information quality. S_A contains 25% macroblocks with worse side information quality and SB contains 75% macroblocks with less side information error. The decoder does not see the original WZ frame, so it can only classify macroblocks based

on some cues available. The classifying result will be sent to the encoder. And then the encoder will group macroblocks in WZ frame while the decoder will group macroblocks in side information according to the classifying result. Macroblocks in same group will be gathered together.

Macroblocks in WZ frame and side information are then transformed and quantized by encoder and decoder. The encoder will calculate quantization interval for each band and send the values to the decoder. So the encoder and decoder use the same quantization interval to quantize coefficients.

The quantized coefficients of WZ frame and side information are then split into bit-planes. Each bit-plane of a WZ frame will be LDPCA encoded by the encoder.

The WZ bits are stored in a buffer and they will be requested by the decoder.

Figure 17. System flow of our transform domain DVC codec

The decoder will request for WZ bits in the buffer and perform LDPCA decoding, and the detail will be described later. Only macroblocks in S_A are decoded.

After every bitplanes of every coefficient bands are decoded, these decoded coefficients will be inverse transformed and macroblocks will be rearranged to original order. The following sections will describe the details in each step of the

proposed algorithm.

4.2 Side Information Generation

As we describe above, key frames are coded by H.264 intra coder and sent to the decoder. The decoder uses neighboring key frames to interpolate side information of center WZ frame.

Steps of side information generation are described in this section.

Figure 18. Motion estimation for neighboring key frames

Figure 19. Bi-directional motion adjustment

In step one, motion estimation is performed for two neighboring key frames, as shown in Figure 18. Size of macroblock is 16 by 16, search range is ±32, and motion vector accuracy is at is full pixel precision. The search range is larger than traditional video codec because the time distance between key frames is 2. In this step, we only use forward motion estimation to guess the motion field of WZ frame. There are many works we can do in order to make the motion field more closed to true motion.

In step two, refer to DISCOVER’s DVC codec[18], a bi-directional motion adjustment is performed, and the search range is ±10 with half pixel precision. The half pixel values are calculated using H.264 six-tap filter. The search range is smaller so the adjusted motion vector will not be very far away from original motion vector.

As in Figure 19, if the motion vector obtained in step one motion estimation is (X₁, Y₁) and the center of macroblock in WZ frame is (Px, Py), then when performing bi-directional motion adjustment, the center is fixed and motion vectors (X₁+d_x, Y1+dy) with dx and dy within range ±10 are searched. The new motion vector (X’, Y’) will make SAD (sum of absolute difference) value of macroblock pair in neighboring key frames smallest. Every motion vectors obtained in step one will be modified in this step.

In step three, median filter is applied in order to smooth the estimated motion field. This step is also suggested by the DISCOVER DVC codec [16]. After motion estimation for neighboring key frames and bi-directional motion adjustment, for each macroblock, its motion vector and the motion vectors of eight-connected macroblock neighbors are listed. Then a median motion vector is obtained among these motion vectors. When deciding which one of them is the median, a weight for each motion vector is used.

There are eight neighbors with motion vectors m₁ to m₈, and motion vector of the center macroblock is m0. For the center macroblock, the motion vector m0 will point to two macroblocks in neighboring key frames, and SAD of these two macroblocks is s0. When m0 is replaced by m1 to m8, the SAD of neighboring two macroblocks will be s₁ to s₈. The weight value w_i of the neighbor motion vector m_i is defined as s₀/s_i. So, if mi makes neighboring macroblocks similar, then its weight is larger. After median motion vector is obtained, the motion vector will be replaced with this median motion vector. Now the motion field of WZ frame is obtained and side information will be interpolated based on this motion field.

Average of macroblocks in neighboring frames is interpolated to generate side information. The SAD values of macroblock in neighboring key frames are recorded for macroblock classification later. Now, there are some pixels which have more than

one projections from neighboring key frames. For such pixels, the average of interpolated values of all projections is used as the side information. Some pixels do not have projections at all, and these pixels remain unfilled as holes in the side information..

Two hole-filling procedures are applied to complete the side information. For the first procedure, if the Manhattan distance between the hole and nearest filled pixel is within 25 pixels, then the motion vector of this filled pixel is used by the hole for motion compensation from neighboring key frames. Otherwise, another hole-filling procedure is applied. Distance upper bound is chosen as 25 pixels because we do not want to use motion vectors of pixels too far away.

The remaining holes will be filled by the second hole-filling procedure. Now for each macroblock in side information, calculate the percentage of holes. If the percentage of holes is less than 40 percent, then motion estimation for this macroblock and previous key frame in display order will be performed. Only filled pixels are used to calculate SAD, and we achieve this by using a mask to ignore difference values at holes when calculating SAD. The macroblock size is 16 by 16 and the search range is ±32. The macroblock with smallest SAD in previous key frame is located and the corresponding pixels in this macroblock will be used to fill the holes in side information.

If the percentage of hole is larger than 40 percent, then the size of macroblock will be enlarged by 2 each time until it reaches 32 by 32. The percentage of holes is 40 percent at most because when there are too many holes in one macroblock, then the valid pixels used to find motion will be minority. Thus an incorrect motion vector will be obtained. The size of macroblock used to find motion vector for holes can not be too large, too. When the macroblock size is too large, the results of motion estimation will be bad because pixels within one macroblock in practice have

different motions.

在文檔中解碼端影像誤差估測用於分散式視訊編碼法校正優先權的設計 (頁 45-50)