The First Phase - 以內容特徵為基礎之運動向量估測演算法及架構研究

3.3 Architecture

3.3.1 The First Phase

The architecture of the first phase contains a Current Macro-Block Buffer, an Edge Generator Unit, a UEPC PEs Array, a Reference Macro-Block Buffer, a Quantiza-tion Unit, an Adder Tree and a Survived MoQuantiza-tion Vectors Selector. After the edge matching processing, the first phase generates two survived motion vectors in each searching row/column for the second phase to perform more accurate matching.

Edge

To Address Generator The First Phase

The Second Phase

Figure 3-4: Block diagram of the edge-driven two-phase motion estimation.

Edge Generator Unit

In Fig. 3-5, we presented the architecture of Edge Generator Unit which is used to produce the edge mask and makes decision of search scan-direction described from Step 1 to Step 3 in section 3.2. This unit contains two main blocks, the high-pass filter block and edge determination block. The former block calculates the gradient of each pixel in current macro-block as shown in equation (3-1). The later one is used to determine the edge mask and X-Y span depicted in the Step 3 of EFBLA.

According to (3-1), the high-pass filter calculates the gradient of a target pixel with eight neighbor pixels around it. The data paths, CMB₁, CMB₂, and CMB₃, are the input interface of previous line, current line and next line from the CMB buffer. The left and right pixels can be reserved by simply delay elements. In order to avoid the boundary error when the target pixel is in the border, the proposed architecture uses multiplexers to switch the null data out of the current macro-block to existent pixels instead. The black-dot in each multiplexer indicates the switching path when the filter unit is processing a border pixel. To calculate the gradient value of a target pixel needs total six equivalent adder operations, which are five adder operations and one absolute operation. We treated the computational load of an absolute operation as an adder operation. The computational load of

×8 is ignored since it can be implemented with simple shift operation.

The edge determination unit, whose structure is illustrated in the right part of Fig. 3-5, implements two main functions. The first one is to figure out the maximum and minimum of the gradient value of the current macro-block and then determines the threshold value according to the equations from (3-2) to (3-4). The second one is to decide the searching scan-direction depicted in Step 3 of EFBLA.

CMB₁

top boundary High-Pass Gradient Filter Edge-Determination

Edge

Figure 3-5: Architecture of Edge Generator Unit.

The determination of scan-direction contains simple logic OR gate and look-up table (LUT) to figure out the XY-span. The edge determination unit generates the edge mask and scan direction for the UEPC matching in the first phase.

RMB Buffer and Quantization Unit

Figure 3-6 illustrates the architecture of RMB buffer and Quantization Unit. The reference macro-block (RMB) buffer has two major functions; one is to provide the parallel data for UEPC PEs arrays in the first phase. The second function is to buffer the data of reference macro-block for the second phase since by this way we can ensure that it accesses the data from the reference frame memory only once. In each clock period, the RMB buffer provides N pixels at the same time to the Quantization Unit and the Quantization Unit transfers them to low-resolution data for the matching procedure of UEPC.

In order to save the hardware resource, the quantization procedure for the current block shares the same quantization cell with the reference macro-block. At the initial time, there are (N + 2P − 1) × (N − 1) cycles to store the

N+2p-1 Q

UEPC PEs Array RMB

Avg_k

N-1

N+2p-1 To the AC Array

in the second phase m

u x CMB

Figure 3-6: Architecture of Shift Register Array and Low-Resolution Quantiza-tion.

reference macro-block ready to provide the parallel data for the PEs array. In this period, the Quantization Unit is idle and can be switched to quantize the current-macro-block.

Processing Elements Array

The architecture of Processing Elements Array is illustrated in Fig. 3-7. The array is composed of N-by-N processing elements to calculate the criteria of unmatched edge-pixel count shown in (3-7). The data path of CMB in the tail of a row is linked to the head of the next row and thus it needs N² cycles to shift all the quantized data of current macro-block into the UEPC PEs array. By this linked data path, to quantize the current macro-block only needs to active one Quantization Unit.

Since the first phase uses the criteria of unmatched edge-pixel count, the PEs array actives the processing element while corresponding pixel is an edge, that

Figure 3-7: Architecture of Processing Element Array.

reg

reg CMB

Enable

RMB

cmp

To Adder Tree

Figure 3-8: Architecture of Processing Element.

is, the edge mask α(u, v) is equal to 1 shown in (3-4). The turn-on/off signal is from the Edge Mask generated from the Edge Generator Unit. The processing element, which architecture is shown in Fig. 3-8, performs the unmatched edge-pixel comparison and produces a signal 1 if the quantized data of current macro-block is not identical to that of reference macro-macro-block. The architecture of the processing element to calculate the unmatched edge-pixel count is shown in Fig.

3-8. Each processing element contains two two-bit shift register to store the low-resolution information of current macro-block and reference macro-blocks. The compared circuit in a PE can be implemented with two exclusive-OR gates and one OR gate. After the matching process, each processing element generates one bit signal to the adder tree and SMVs selector for further evaluating the correlation between the current and reference macro-block.

UEPC Accumulator and SMVs Selector

The UEPC accumulator is used to accumulate the unmatched edge-pixel signal from each processing element. There is a look-up table (LUT) in each column to transfer the unmatched signals to a binary number which counts that how many

UEPC PEs Array

LUT_N-1 16

LUT₂ 16

LUT₁ 16

LUT₀ 16

Parallel Adder Tree

4 4 4 4

SMVs Selector

SMVs

UEPC Accumulator

Figure 3-9: Architecture of UEPC Adder Tree and SMVs Selector. We assume that N is 16.

unmatched pixels in this column. Then the binary number can be summed up by a parallel adder tree to measure the total unmatched edge-pixels in the macro-block. The SMVs selector uses these unmatched edge-pixel counts to pick up two survived motion vectors in each column for further detail matching in the second phase. So the first phase figures out 2-by-2p survived motion vectors which are the most possible motion vectors to the second phase. The architecture of the UEPC accumulator and SMVs selector are shown in Fig. 3-9.

在文檔中以內容特徵為基礎之運動向量估測演算法及架構研究 (頁 48-56)