• 沒有找到結果。

C. Support of B-frame and 8x8 Block Search

3.6 Summary

In this paper, we have proposed new motion estimation hardware architecture to achieve low power and high bandwidth efficiency. The proposed design is developed from a very low complexity motion estimation algorithm called all binary motion estimation [11]. It integrates several important features including (1) MB based pre-processing (2) support of B-frame parallel search (3) parallel processing of 8x8 and 16x16 LV3 block searches (4) shared processing units to reduce the hardware cost (5) efficient LV2 search to reduce the latency. We also analyze how low power and high bandwidth efficiency can be achieved with the proposed design. Experiments show that the power consumption can reach as low as 763µW for IPPPP CIF 30fps and 896µW for IPBPB CIF 30fps. The bus bandwidth saving can achieve up to 54.3% for P-frame only forward search and 67.1% for B-frame search.

Chapter 4

Power Adaptive Iterative Binary Search (PA-IBS)

4.1 Introduction

1The power adaptive designs have become an important feature especially for portable video applications [59]. Unlike the low power designs that aim for minimized power con-sumption, the power adaptive design targets on the efficient allocation of power resources with equal video quality and longer battery life. In multimedia compression systems such as MPEG-1/2/4 and H.26x, the motion estimation (ME) that dominates the power con-sumption of the video encoder plays a key role in the power adaptive design. We will present a power adaptive ME design to improve power allocation and power efficiency.

In the power adaptive or complexity adaptive ME algorithms and designs [56, 57, 58, 47, 48, 49, 50, 45], we can roughly categorize them into two types according to their

imple-1The authors would like to thank National Chip Implementation Center(CIC) for chip fabrication.(Chip No: T18-95E-04A)

86

Chapter 4: Power Adaptive Iterative Binary Search (PA-IBS) 87

mentation methods. The first type is to achieve power adaptation by integration of multiple search strategies. This type adopts 2 to 3 search strategies such as three-step search, dia-mond search or full search to deliver different levels of search complexity. For example, the authors in [47] proposed a three-mode complexity adaptive method by using three-step search and enhanced four-step search for low power applications, and full search for high quality applications. Although this type of method can provide large scale of complexity differences, the coding quality for low power modes usually has significant quality loss.

The second type is to achieve power adaptation by simplified matching criterion. The simplified criterion include bit-depths truncation, pixel decimation, etc. By keeping differ-ent bit-depths or decimated pixel resolutions for block matching, the design can achieve different levels of computational complexity and power consumption. For example, the au-thors in [57, 58, 49] adopt the least-significant-bit truncation method to design their power adaptive ME. Pixel bit-depth of 1 or 2 is served for low power mode, and bit-depth of 8 is served for high quality mode. This type can provide the significant power reduction by dynamically adjusting the bit-depths, but it still suffers from significant quality loss in low power mode.

Table 4.1 summaries for the two types of complexity adaptive algorithms or power adaptive designs. Both of the methods have the significant quality loss in low power modes.

The bit-depth truncation method also has the issues in limited pixel bit-depths and bit-plane dependency. Limited pixel bit-depths cause the difficulty for fine-granularity of power adaptation. Bit-plane dependency causes the inefficiency for data access and processing. To address these issues, a new power adaptive ME algorithm and hardware architecture called Power Adaptive-Iterative Binary Search (PA-IBS) is proposed with four key features:

88 Chapter 4: Power Adaptive Iterative Binary Search (PA-IBS)

Table 4.1: A summary of power adaptive motion estimation designs.

Methods Type I Type II

Multiple search strategies Simplified matching criteria

Examples PMVFAST-EPZS (2 modes)

[48]

Pixel sub-sampling [56]

FS-3SS-E4SS (3 modes) [47] Bit-depth truncation [57, 58, 49]

MV refinements [45]

Pro and Con Pro: Pro:

(1) Large scale of complexity re-duction

(1) Simple for VLSI implemen-tation

Con: Con:

(1) Limited modes (1) Limited pixel bit-depths (2) Inflexibility for VLSI

imple-mentation

(2) Significant quality loss (3) Significant quality loss (3) Bit-plane dependency

• Frequency decomposed bit-planes design: PA-IBS algorithm adopts the frequency decomposition method for bit-planes design. The new bit-plane design method gen-erates directional and gradient image features in binary format, and can provide better rate-distortion performance as compared to using pixel bit-planes.

• Finer granularity of power adaptation: The number of frequency decomposed bit-planes is not limited to pixel bit-depths. This allows finer granularity of power adap-tation for smooth power and video quality adjustment.

• Independent bit-plane processing: The frequency decomposed bit-planes can be in-dividually stored in the memories and independently processed. Therefore, we can avoid unnecessary memory access and data processing to those unrelated bit-planes.

• Frequency scaling based hardware architecture: The independent bit-plane process-ing provides the advantage to design the hardware for processprocess-ing sprocess-ingle bit-plane instead of all bit-planes. To full use this hardware design for single bit-plane

process-Chapter 4: Power Adaptive Iterative Binary Search (PA-IBS) 89

ing, the frequency scaling technique scales the working frequency with the number of bit-planes to be processed. Such hardware architecture reduces the overheads to design the hardware for the worst case of all bit-planes, and enhances the hardware utilization and power adaptation performance.

The remainder of this chapter is organized as follows. Section 2 describes our metric to measure the power adaptation performance. Section 3 describes the proposed power adaptive ME algorithm. The VLSI design issues for the proposed algorithm are addressed in Section 4. In Section 5, we show the hardware architecture, and its experimental results are demonstrated in Section 6. Section 7 gives the concluding remarks.

相關文件