• 沒有找到結果。

The low power ME designs from 1995 to 2007 are surveyed and categorized into 3 groups according to their design approaches. The first group is to achieve low power by fast ME algorithms [19, 36, 37, 38, 42, 20, 21, 26]. This group of designs apply fast ME algorithms such as three-step search (TSS), hierarchical search, etc. which reduce the search candidates for computational power reduction. The second group is to achieve low power by simplified block matching criterion [34, 35, 49, 52]. The commonly used block matching criterion is SAD (eqn. 2.1) or SSD (eqn. 2.2). Although they can provide good R-D performance, it takes lots of computational power. For power reduction, the approaches such as using Most Significant Bits (MSB) only for block matching, or pel-subsampling which takes partial pixel data in that block for block matching, etc. are able to reduce computational power of the block matching operations for power saving. The third group of designs is to archive low power by efficient hardware architectures [18, 39, 40, 22, 23, 24, 25, 26, 17, 27, 29]. For example, the one-dimensional (1-D) systolic array for

28 Chapter 2: Review of Power Constrained Motion Estimation Designs

Table 2.2: Evaluation of low power designs using design metrics.

Groups Q T A U B P

Reduced candidates X O - - - O

Simplified Matching Criteria X O - - - O

Efficient Architectures - O - - - O

O: improved X: degraded -: case dependent

full search or an efficient memory hierarchy architecture can effectively reduce the power consumption. Table 2.2 shows the influences to the design metrics in Section 2.1.2 by using these three design approaches for low power hardware designs.

In the following, we will introduce the frequently cited low power design works, and summarize their design metrics evaluation in Table 2.3 and Table 2.4. These works are also used as the reference to the proposed BBME design in later chapters.

2.2.1 Low Power by Fast ME Algorithms

Miyama et al. [36] proposed a sub-mW motion estimation processor core by developing a Gradient Descent Search (GDS) algorithm with the optimized hardware architecture for mobile applications. The GDS algorithm is to reduce the required computational complex-ity and hardware operational cycles for ME. The Single Instruction Multiple Data (SIMD) data path is to reduce the required clock frequency by maximizing the parallel processing ability. The three-port SRAM acts as the data cache to reduce the power consumption.

These features make this hardware core to be able to run QCIF 15fps at 0.85 MHz with 0.4 mW power consumption. The hexagon plot is shown in Fig. 2.5(a).

Chao et al. [30] proposed a hybrid motion estimation hardware architecture to support

Chapter 2: Review of Power Constrained Motion Estimation Designs 29

Successive Elimination Algorithm (SEA) and Diamond Dearch (DS). The irregular flow between the two fast algorithms are solved to achieve different applications for high quality and low power. This design has 3 modes including: (1) SEA without early cut, (2) SEA with early cut (at cycle 4208 to meet CIF 30fps at 50MHz), (3) DS without early cut. Running on the third mode, the power consumption is 223.6 mW for CIF 30 fps with 50MHz clock frequency. The hexagon plot is shown in Fig. 2.5(b).

Chen et al. [26] proposed a an optimal low power IME engine with a parallel hardware architecture supporting fast algorithms and efficient data reuse (DR) called content adaptive parallel-VBS 4SS. This design has 3 modes to achieve different video quality and power consumption. These 3 modes are: (1)high quality mode, (2)low power mode, and (3) ultra low power mode. The first mode is with 2 reference frame and multiple iterations to achieve high quality. The second mode is with 1 reference frame and multiple iterations to achieve minor quality loss and low power consumption. The third mode is with one reference and single iteration to achieve ultra low power consumption. Running on the third mode, the power consumption is 2.13 mW for CIF 30 fps with 13.5 MHz clock frequency. The hexagon plot is shown in Fig. 2.6(a).

2.2.2 Low Power by Simplified Block Matching Criteria

Huang et al. [37] proposed a new block matching algorithm called Global Elimination Algorithm (GEA) and its optimized architecture to achieve the low power design. The GEA is developed from Successive Elimination Algorithm (SEA), but saves more SAD compu-tations by calculating sub-sampled pixel data for early terminations. The early termination can save more unnecessary power consumption for SAD computations. This hardware

de-30 Chapter 2: Review of Power Constrained Motion Estimation Designs

sign can achieve more than CIF 30 fps at 25 MHz with 189 mW power consumption. The hexagon plot is shown in Fig. 2.6(b).

Wang et al. [35] proposed a low power ME design by implementing All Binary Motion Estimation (ABME) algorithm and proposing an optimized hardware architecture for the binary bitplane of block matching. The images for search are firstly formatted as binary bitplane, and the block matching criterion is modified to use the binary data for pattern matching. The pattern matching using binary data can greatly reduce the computational complexity, thus the power consumption is saved. The power consumption for CIF 30fps is 2.2mW. The hexagon plot is shown in Fig. 2.7(a).

2.2.3 Low Power by Efficient Hardware Architecture

Shen et al. [39] proposed a low-power full-search block matching (FSBM) motion-estimation design for H.263+. To minimize power consumption, techniques such as gated-clock and dual-supply voltages are used. This design runs CIF 36fps at 60 MHz, and the power consumption is 423.8 mW. The hexagon plot is shown in Fig. 2.7(b).

Chen et al. [66] proposed an parallel-SAD tree with a shared reference buffer for H.264 integer motion estimation (IME). To solve the huge memory bandwidth required by H.264 IME, an efficient memory architecture is proposed to save 99.9% off-chip memory band-width and 99.22% on-chip memory bandband-width. This design can run 720P 30fps solution at 108 MHz with 330.2k gate count and 208k bits on-chip memory.

Yap et al. [40] proposed a new 1-D VLSI architecture for H.264 IME. The SAD com-putation is performed by reusing the results of smaller sub-block comcom-putations to save the computations and power. They are combined with a shuffling mechanism within each

pro-Chapter 2: Review of Power Constrained Motion Estimation Designs 31

cessing element to process up to 41 MV sub-blocks in the same number of clock cycles.

The design supports CIF 191fps processing rate with 294 MHz, and the power consumption is 0.008 mW/MB/fps. The hexagon plot is shown in Fig. 2.8(a).

Ou et al. [67] proposed a new 2-D VLSI architecture for H.264 IME. Using the 2-D systolic array architecture, this design is able to complete a MB search in 256 cycles with 100% PE utilization. The power consumption is 20.48mW. The hexagon plot is shown in Fig. 2.8(b).

相關文件