Proposed Low Memory Cost Access Approach - Analysis and Design of Baseline Belief Propagation

3.2 Analysis and Design of Baseline Belief Propagation

3.2.2 Proposed Low Memory Cost Access Approach

In a rectangular node plane, the memory cost is constituted of the messages and the data cost. In this dissertation, we focus on the messages, which occupy the most of the cost. A straightforward memory access approach for the messages is the ping-pong buffer approach, which needs a pair of node planes and requires 8HWL memory. Unfortunately, this cost is too large to be on-chip. Even if the messages are stored in an external memory, its required bandwidth is still impractical, especially for the image-scale node planes.

1. Previous Work

To reduce the memory cost of messages, Yu et al. [35] compressed the messages by the envelope point transform method that can achieve eight times compression without significant degradation of disparity maps. However, this compression method needs the overheads of compression and decompression.

On the other hand, much previous work focuses on the computing order of message passing on the node plane to resize the node plane for memory cost reduction. Park et al. [34] proposed the fast BP structure approach which resizes the pair of node planes from HW to TW, where T is usually

smaller than H. In our previous work [38], we proposed the in-place message update approach that resizes one of the pair node planes from HW to 3W for buffering partial new messages temporally.

Felzenszwalb and Huttenlocher [25] delivered the bipartite scan which only needs one node plane, and can also reduce computation to half. Different from above computing orders, Szeliski et al. [26]

proposed the BP-M scan which updates messages direction by direction for whole node plane to accelerate convergence speed, and only needs one node plane. Although the BP-M scan can converge faster than others, the memory cost of BP-M scan is still too high and could not be further reduced because of its iterative directional process and overlapping data lifetime in all messages. Thus, the BP-M scan is not discussed in this dissertation.

Excluding the BP-M scan, the memory access in the previous work belong to the fixed memory access approach which binds messages at fixed memory positions, and thus would limit the possibility to reduce memory cost. Figure III-3 shows the data dependency of the traditional fixed memory access approach between successive iterations in a simplified 1-D node line, where each square represents a memory position, the arrow inside the square represents a stored message, and the cross line linking two messages (e.g. m3 at t1 to m2 at t2) represents that they have data dependency. In the traditional approach, each node’s messages are stored at fixed memory positions. For example, the node n3’s messages m3 are always located at the same memory position pos3 in all iterations. These messages

m3 are used to calculate the neighboring nodes n2’s and n4’s new messages m2 and m4 for next

iterations. However, the new messages cannot overwrite their old ones at the memory position pos2 and pos4 since their old ones are still needed for new messages computation at other nodes. Thus, an access conflict would occur between the old and new messages of the neighboring nodes. To solve the access conflict, a straightforward method is to allocate an additional memory to buffer the new messages, but it will increase extra cost.

Figure III-3 Traditional fixed memory access approach in a 1-D node line for node n3 computation

2. Spinning-Message Approach

To address the access conflict and reduce memory cost, we propose the spinning-message approach that frees the bind between the messages and the memory positions, and eliminates the extra memory. In addition, the proposed approach could be applied to the reduction techniques mentioned in previous sub-section to further save 50% memory cost.

Figure III-4 (a) shows the main idea of the proposed approach. The old messages of the center node are used to calculate the new messages of the neighboring nodes, and their data life time is ended.

Therefore, the new messages of the neighboring nodes can overwrite the outdated messages without access conflict, and are stored at the center memory positions instead of the neighboring memory positions.

Based on the main idea, Figure III-4 (b) shows the details of the proposed spinning-message approach by a 1-D node line for the node n3 as an example. Other nodes follow the same procedure.

At the iteration t1, the messages m3 are stored at the center memory position pos3 that is the centralized mode. For the transition to the iteration t2, the messages m3 are used to calculate the new messages m2 and m4 of the neighboring nodes n2 and n4. The old messages m3 can be replaced by the new messages at the center memory position pos3 without the access conflict. After the calculation

(pos1) (pos2) (pos3) (pos4) (pos5)

and replacement, the centralized mode changes to the distributed mode since every node’s messages are distributed at its neighboring memory positions (e.g. m3 at pos2 and pos4) at the iteration t2. Then, the distributed messages m3 are used to calculate the new messages m2 and m4, and the distributed messages m3 can also be replaced by the new messages without the access conflict. With another calculation and replacement, every node’s messages are returned to the centralized mode at the iteration t3.

In summary, the messages are centralized at their own memory positions for odd iterations and distributed at their neighboring memory positions for even iterations. With this approach, we can save the memory while avoid the access conflict. Figure III-5 shows the proposed approach extended to a 2-D node plane.

(a)

(b)

Figure III-4 Proposed spinning-message approach

(a) main idea; (b) memory access in a 1-D node line for node n3 computation.

New messages

(pos1) (pos2) (pos3) (pos4) (pos5)

Figure III-5 Proposed spinning-message approach in a 2-D node plane for node n3 computation

3. Applications

The proposed spinning-message approach can be applied to different types of node plane to further reduce their memory cost.

(a)

(b)

Figure III-6 Comparison of memory access approaches in different node planes (a) proposed spinning-message approach, (b) traditional fixed memory access approach

n2 n4

Ping-pong buffer approach Sliding node plane Bipartite node plane Fixed memory access

Proposed Sliding-bipartite node planeT

Proposed spinning-message approach

Sliding node plane Bipartite node plane

(1)

Sliding Node Plane

In the original BP, the messages in a node plane are iteratively updated by the space-first (x-y plane) computing order, and the node plane moves along the iteration axis as shown in the ping-pong buffer approach of Figure III-6 (a). In contrast, the sliding node plane moves orthogonal to the iteration axis, and their messages are updated by the iteration-first computing order. The size of sliding node plane is its projective area on the x-y plane, which is smaller than the original node plane.

Figure III-7 shows three sliding directions. In which, the sizes of node planes are WT for the vertical sliding and HT for the horizontal sliding, and the diagonal sliding. The vertical sliding node plane was proposed by the fast BP structure approach in [28]. However, its size is larger than the other two because W is usually larger than H. Therefore, we recommend the horizontal sliding node plane, which totally requires 8HTL memory for messages.

(a) (b) (c)

Figure III-7 Sliding node plane in different directions (a) vertical sliding; (b) horizontal sliding; (c) diagonal sliding.

The memory cost can be further reduced to 4HTL by the proposed spinning-message approach as shown in Figure III-6 (b). Figure III-8 shows the details of the spinning-message approach performing on the horizontal sliding node plane. The initial state of the messages is shown in Figure III-8 (a), where the front of the node plane arrives at the node n6. Then, in Figure III-8 (b), the new messages in the node plane are computed from the node n7 to n2 step by step. With the spinning-message approach, the new messages can overwrite the old messages at the same memory positions. After that, in Figure III-8 (c), the front of node plane will slide to the node n7. According to the above flow, the spinning-message approach could cooperate with the sliding node plane well to further save 50%

memory cost.

T H

W T

47 (a)

(b)

(c)

Figure III-8 Sliding node plane with the spinning-message approach

(a) the node plane slides to the node n6; (b) the computing order of the message passing; (c) the node plane slides to the node n7.

n1 n2 n3 n4 n5node (memory position)

(pos1) (pos2) (pos3) (pos4) (pos5) n6 n7

(pos6) (pos7)

n1 n2 n3 n4 n5node (memory position)

(pos1) (pos2) (pos3) (pos4) (pos5)

n6 n7

n1 n2 n3 n4 n5node (memory position)

(pos1) (pos2) (pos3) (pos4) (pos5)

n6 n7

(pos6) (pos7)

(2)

Bipartite Node Plane

The bipartite node plane was proposed in [25] that divides nodes into two parts, like a chessboard as shown in Figure III-6 (a). In which, one part is computed at odd iterations, and the other is computed at even iterations. Its memory cost is reduced from a pair of node planes in ping-pong buffer approach to only one node plane of 4HWL.

Above memory cost can be further reduced to 2HWL by the proposed spinning-message as shown in Figure III-6 (b). Figure III-9 shows the spinning-message approach performs on the bipartite node plane at odd iterations and even iterations. At the odd iteration in Figure III-9 (a), the messages of the white nodes are used to calculate the new messages of the black nodes, and these messages of the black nodes can overwrite those of the white nodes. Then the state of node plane is transformed to Figure III-9 (b). Similarly, the messages at the even iteration can be returned to the next odd iteration.

Thus by the spinning-message approach, only the white nodes need memory, and 50% memory cost can be saved.

(a) (b)

Figure III-9 Bipartite node plane with the spinning-message approach

(a) message passing for white nodes at odd iterations; (b) message passing for black nodes at even iterations.

(3)

Proposed Sliding-Bipartite Node Plane

By combining the above sliding node plane and bipartite node plane, the memory cost can be reduced to 4HTL. Furthermore, applying the proposed spinning-message approach, the memory cost can be reduced to 2HTL as shown in Figure III-6 (b). Figure III-10 shows the spinning-message approach performs on the sliding-bipartite node plane. In a similar way as the sliding node plane, the front of the sliding-bipartite node plane can slide from the node n6 to n8 by the computing order in

Figure III-10 (b). Therefore, the proposed sliding-bipartite node plane takes advantages of the sliding node plane and the bipartite node plane to reduce memory cost.

(a)

(b)

(c)

Figure III-10 Proposed sliding-bipartite node plane

(a) the node plane slides to the node n6; (b) the computing order of the message passing; (c) the node plane slides to the node n8.

n1 n2 n3 n4 n5 node (memory position)

(pos1) (pos2) (pos3) (pos4) (pos5) n6 n7

(pos6) (pos7) n8

n1 n2 n3 n4 n5 node (memory position)

(pos1) (pos2) (pos3) (pos4) (pos5) n6 n7

(pos6) (pos7) n8

n1 n2 n3 n4 n5 node (memory position)

(pos1) (pos2) (pos3) (pos4) (pos5)

n6 n7

(pos6) (pos7)

n8 (pos8)

在文檔中適用於高畫質立體電視應用之視差估測設計研究 (頁 59-68)