6. HARDWARE IMPLEMENTATION
6.2. F UNCTIONAL B LOCK
This chapter introduces the details of the hardware implementation, including the input and output control, census transform, weight generating, aggregation, and winner-takes-all.
6.2.1. Mini-Census Transform
Fig. 6‐2 the module of census transform for left and right image
Fig. 6-2 shows the architecture of the mini-census transform. This architecture contains three blocks: input image buffer, update control and mini-census transform.
The mini-census transform compares 7 pixels distributed within 5x5 window to calculate one census result. The generation of one census result requires multiple loads from the input image data. Therefore, to reduce the times of data load from the input image, the input is buffered and reused. The input controller stores the input image in the register first. After one word of the data is stored in the register, it will be transferred to the memory buffer. The output control reads the data from the buffer to the register and census block. The register stores the data of center pixels, and the other pixels are transferred to census block. The census block compares the pixels to the center pixels, and then it generates the comparison result. This update control maintains the content of the memory buffer. The update control contains a table storing the validation for each
39
column of the memory buffer. The access of the memory buffer from input and output control is prohibited without checking the status of the validation table. This favors the synchronization between the input and output control.
6.2.2. Weight Generation
Fig. 6‐3 the module of weight generation of vertical and horizontal weights
Fig. 6-3 shows the architecture of the weight generation. The architecture is similar to the architecture of census transform discussed in 6.2.1. However, there are two differences. The first difference is that the input control requires three dimension of color space. Therefore, there should be three input controls and three input buffers. The second difference is that there is additional buffer for output control, which is used for horizontal weight generation. The input control is similar to the one in census transform, only the address control and data size is slight different. After the input buffer is ready, the weight generation block starts to calculate the vertical weight and horizontal weight.
The weight generation firstly loads the image data from the input image buffer to
40
generate the vertical weight by looking up the weight table. The input Y, U, and V images are also stored in the BUFFYUV during the generation of vertical weight. After the vertical weight is generated, the horizontal weight is generated by reading the buffer BUFFYUV.
6.2.3. Aggregation and Winner-Takes-All
Fig. 6‐4 the module of cost aggregation and its processing element
The Fig. 6-4(a) shows the architecture details of the aggregation and winner-takes-all(WTA). At first, the hamming distance is calculated by the left and right census results, which are CSL and CSR on the figure. The 0~30 hamming distances or called initial cost are sent to the processing element. And then the vertical aggregated cost is calculated by the summation of the shifted initial costs. Fig. 6-4(b) shows the detail of the PE. The initial costs are firstly shifted by the associated weights, and then they are summed together. The calculated vertical aggregated cost will be stored in a ping-pong buffer. The second pass aggregation reads the vertical aggregated cost from the ping-pong buffer. The same, the horizontal aggregated cost is shifted and summed. The final cost will be sent to the winner-takes-all block, which compares the
Disparity Reg
(a) Aggregation and Winner-Takes-All (b) Processing Element of Cost Aggregation
Out
41
cost with the minimal cost. If the aggregated cost is smaller than the minimum cost, it will replace the minimum cost, and become the disparity candidate, which is stored in the disparity register. The final depth is the shifted disparity normalized to the range of the luminance.
Fig. 6-5 shows the detail of the ping-pong buffer in Fig. 6-4. There are 48 entries for each of the buffer. The figure shows the status of each entry. The color of white, light blue, deep blue and orange means that the entry is empty, being written, ready for reading, and being read respectively. At the first, all the entries are empty, and then the vertical aggregated cost is written into the buffer. After all the entries of the buffer 1 is all ready, the vertical weight will be written into the buffer 2. To generate three horizontal aggregated cost, 33 ready entries are required. Therefore, the vertical weight will be calculated after 40 entries are ready. After that, three entries will be cleared since the data are available anymore. The speed of update and consumption of the buffer are at balanced. Hence, the weight can be calculated continuously.
Fig. 6‐5 the ping‐pong buffer of cost aggregation module 6.2.4. Input and Output Control
Fig. 6-6 shows the concept of the input and output control used by most of the modules in this design. The control deals with the handshaking mechanism which will be discussed in 6.3. Firstly, the state is at WAIT state. The input control waits for the
Address Address
Buffer 1 Buffer 2
Empty Write Ready Read Empty Write Ready Read
42
update of invalid column of internal memory, which will be discussed in 6.5.1. Once the internal memory need an update, the input control sends the request signal to the transmitter, and wait for the data at the REQUEST state. The state changes to SEND state while receiving the data. After all, it will return to WAIT state after a transaction.
On the opposite, the output control waits for the validation of internal memory at the WAIT state. Once is the internal memory is valid, it will send the ready signal to receiver, and waits for the request at the READY state. It will switch to SEND state once the request signal is received. The same, it returns to WAIT state after a transaction.
Fig. 6‐6 the finite‐state‐machine of the input and output control