5.1. Overview
External memory bandwidth and internal memory size have been major bottlenecks in designing VLSI architecture for real-time stereo matching hardware because of large amount of pixel data and disparity range. To address these bottlenecks, this chapter explores the impact of data reuse on disparity-order and pixel-order with the partial column reuse (PCR) and vertically expanded row reuse (VERR) techniques we proposed. The analysis result suggests that the disparity-order reuse with both PCR and VERR techniques is suitable for low memory cost and low external bandwidth design, whereas the pixel-order reuse with both techniques is more suitable for low computation resource requirement. However, the implementation of disparity-order requires high internal bandwidth. Hence, our final implementation adopted a hybrid of both the disparity-order and pixel-order reuse with VERR technique.
5.2. Architecture Overview
Fig. 5‐1 the overview of hardware architecture
30
On implementing aggregation based method under real-time constraint, there are many solutions to the data reuse issue. We will use the hardware architecture shown in Fig. 5-1 to explain different solutions.
In the matching cost computation, if data reused along the disparity axis is preferred, the computation of all the matching costs of a pixel is computed before jumping to the next pixel. This allows the data within the matching cost support window to be reused. However, the cost aggregation sums the initial matching costs of the same disparity together, which would prefer the initial costs to be output along the spatial X-Y plane than the disparity axis. As a result, to compute the aggregated cost within an aggregation window, all the matching costs at each disparity must be stored before the aggregation can be performed. These initial matching costs form a cuboid in the disparity-spatial D-X-Y space. The volume of this cube represents the memory size needed to store the initial costs. One way to reduce the storage requirement is to avoid the conflict in data reuse direction. For instance, change the reuse direction in the matching cost computation to the X-Y plane so that it meets the processing direction in the cost aggregation. Although doing so removes the conflict between the matching cost computation and the cost aggregation, the conflict between the cost aggregation and the disparity computation exists. To determine the disparity of a pixel, the disparity computation needs to have all the aggregated matching costs at each disparity for that pixel. However, the aggregated costs are generated in the X-Y plane direction, which is different from the direction preferred by the disparity computation. Consequently, additional storage would be required to store the aggregated costs. These conflicts in the data generation and reuse directions play a key role in determining the storage requirement. Therefore, it is important to derive the best data reuse strategy which resolves these conflicts so that the storage requirement can be minimized.
31
5.3. Matching Cost Computation Reuse
The data reuse in the matching cost computation can be categorized into two types according to the reuse order. The details of these data reuse method are explained below.
5.3.1. Disparity-Order Reuse
Fig. 5‐2 the two data reuse directions with different size of support window
The disparity-order reuse reuses the data in the matching window of different disparities. Fig. 5-2(a) illustrates how disparity-order reuse works. When we compute the disparity of a pixel in the left image, the matching window in the right image would slide leftward within the disparity range. In other words, the matching cost of different disparities for a pixel in the left image is first computed. Then the matching cost computation of the next pixel in the left image is performed. With the disparity-order reuse, the overlapped data within the matching window in the right image shown in Fig.
5-2(a) can be reused to compute the matching cost at different disparities. As a result, if
Left Image Right Image Y
D
Left Image Right Image Y
X D
Y X
Y
Matching Cost
(a) Matching Cost Generating in Disparity Direction
(b) Matching Cost Generating in XY Plane
Data Reuse Region
X
X Data Reuse Region
32
the pixel data are stored in external memory, there is no need for repeating accesses of the overlapped pixels. Hence, the bandwidth requirement to external memory can be reduced. However, the order of matching cost generation is different from the order of the matching cost consumption in the following cost aggregation step. This would result in additional memory storage requirement.
5.3.2. Pixel-Order Reuse
Comparing to the disparity-order reuse, the pixel-order reuse reuses the data overlapped by the neighboring matching window in both left and right images. Fig.
5-2(b) illustrates the detail of the pixel-order reuse. The matching cost of the same disparity for each pixel is first computed. Then the cost of the next disparity for each pixel is computed. As a result, the matching window in the left and the right images both slides synchronously with the same disparity offset. With the pixel-order reuse, the overlapped data within the matching windows shown in Fig. 5-2(b) can be reused.
Therefore, the pixel-order reuse can also reduce the external memory bandwidth requirement. In contrast to the disparity-order reuse, the order of matching cost generation is the same as the order of the cost consumed by the following cost aggregation step. Hence, the buffer size between the two steps can be reduced.
However, the data reuse can only be exploited during the cost computation of one single disparity. There is no data reuse between the computations of different disparities.
Once all the computation of the previous disparity has been completed for all the pixels in the whole image, pixel data have to be read from the external memory again. Unless all the previously read pixel data could be stored within the internal memory, otherwise repeating external memory accesses are inevitable.
33
5.4. Cost Aggregation Data Reuse
In addition to the data reuse in the matching cost computation, there are two data reuse methods in the cost aggregation. The details of these two data reuse methods are explained as follows.
5.4.1. Partial Column Reuse (PCR)
The partial column reuse method reduces the local memory size in the cost aggregation by distributing the computation of aggregated cost to each column. Instead of computing the aggregated cost after all the initial costs in an aggregation window are available, the PCR computes the partial sum of a column after the initial costs of this column are available. As a result, the size of the local memory can be reduced from a window to only one column. Moreover, the partial sum of each column can contribute to the aggregated cost of multiple overlapped windows. Storing partial column cost requires less local memory size than storing all the initial matching costs in a column.
Fig. 5-3 illustrates an example of the PCR with a 5x5 aggregation window size. An aggregated cost requires the partial sum of five initial cost columns. With the PCR, the current partial column sum in Fig. 3 can be reused to contribute to the aggregated cost of windows 1 to 5.
Fig. 5‐3 The partial column reuse (PCR) in 5x5 aggregation window
Aggregation Windows Window 1 Window 2
Window 3 Window 4 Window 5
34
5.4.2. Vertically Expanded Row Reuse (VERR)
The vertically expanded row reuse reduces the bandwidth requirement to the cost aggregation engine by deliberately access additional rows of initial costs. If there’s no VERR, when the aggregation finishes processing the current row and jumps to the next row, the overlapped data between the windows at the previous row and the current row have to be read from the cost computation engine again. Fig. 4 shows an example of the situation that the data are overlapped. To avoid accessing the already accessed costs, the VERR vertically expand the rows of initial costs to be read so that they can be reused to compute multiple rows of aggregated cost.
Fig. 5‐4 Vertically Expanded row reuse(VERR)
Fig. 5-4 shows how VERR reduces redundant access of the overlapped data.
Without the VERR, most of the data in the windows are overlapped for many times.
Consequently, these overlapped data are read repeatedly multiple times. In contrast, with the VERR, the portion of overlapped data becomes much smaller than the case without the VERR. Moreover, the overlapped data in the VERR case only overlap once.
This implies that with the VERR, the repeating accesses of the overlapped data would be fewer than the case without the VERR.
35
Fig. 5-5 plots the relationship between the average access count of an initial matching cost and the value k given an aggregation window size of 25x25. The value k represents the number of expanded rows. It can be observed that the average access count decreases as k increases. This suggests that with more rows expanded, less bandwidth is needed. However, increasing the value of k will also increase the local memory size and computing resource requirement.
Fig. 5‐5 The average access count versus the number of expanded pixel
5.5. Comparison
TABLE I compares the estimated memory size and bandwidth requirement of the disparity-order and pixel-order reuse methods. The target disparity image is 352x288 pixels large with 64 disparity levels. The real-time constraint is 30 fps. The architecture is assumed to operate at 100MHz clock with a 32-bit data port to the external memory.
The size of support window in the matching cost computation and cost aggregation are 9x9 and 25x25 pixels respectively.
5.6. Summary
This chapter explores the impact of disparity-order and pixel-order data reuse in the matching cost computation and proposed the partial column reuse (PCR) and
0 5 10 15 20 25 30
0 5 10 15 20 25
Averange Access Count
Expanded Pixels
Access Count VS Expanded Pixels
36
vertically expanded row reuse (VERR) techniques for the cost aggregation. The analysis and comparison conclude that the architecture using the disparity-order reuse with both the PCR and VERR techniques is suitable for the design of low memory cost with high computation resource. On the other hand, the architecture using pixel-order reuse with VERR technique requires less computation resource, but needs large internal memory in storing the aggregated cost.
TABLE 5‐1 the result of approximated color distance
Section Property
Disparity-Order Pixel-Order
Original +PCR +VERR +PCR
+VERR Original +PCR +VERR +PCR +VERR Step 1 Internal Memory
Size (KBytes) 2.4 2.4 2.6 2.6 2.2 2.2 2.4 2.4
Step 2 Internal Memory
Size (KBytes) 40.0 1.6 44.8 1.8 0.6 0 1.8 0.1
Step 3 Internal Memory
Size (KBytes) 0.1 0.1 0.1 0.1 228.1 0.0 228.1 228.1
Constraint (30 fps) Meet Meet Meet Meet Fail Fail Meet Meet
37