P IPELINE - ARCHITECTURE DESIGN - 具平型化處理之JPEG2000方塊編解碼晶片設計

CHAPTER 4. ARCHITECTURE DESIGN

4.4. P IPELINE

Recall the column-based registers, in the encoding, the register B is coded using Pass 3 coding, and register D is coded using Pass 1 and Pass 2 coding, as described in section 4.1. The sample which has been coded by three coding passes will shift to register A, and SMW will update significance memory by the data of four samples in register A. The register F is stored of data loaded from memory by RG. Figure 4-14 shows the relation of five blocks (P1M, P2M, P3M, RG, and SMW) and six registers in encoding.

n n+1

n-1 n-2 n-3

A B C D E

n+2

Time N

F RG P2M

P3M SMW

P1M

Figure 4-14 Relation of five blocks and six registers in encoding

In decoding, Pass 2 coding module delays two columns to register B. And

CHAPTER 4. ARCHITECTURE DESIGN

register A is coded also by CMW. Figure 4-15 shows the relation of six blocks (P1M, P2M, P3M, RG, SMW, and CMW) and six registers.

n n+1

Figure 4-15 Relation of six blocks and six registers in decoding

From Figure 4-14 and Figure 4-15, we know that if every module finishes its work in the corresponding register, the data of registers will shift left. According to this concept, pipeline architecture is easy to implement.

Figure 4-17 is the flow chart of pipeline architecture. The code-block size is 8

× 7, 7 columns and 8 rows. We define the index of every sample as follows.

And we name column by the index of first sample in that column. For example, the light yellow ellipse named 0 represents the column composed of sample 0, sample 32, sample 64, and sample 96. The pink ellipse named 1 represents the column composed of sample 1, sample 33, sample 65, and sample 97.

SMW

Figure 4-17 Pipeline architecture of encoding and decoding in normal case

Take encoding for example. At the beginning (Time 0 and Time 1), RG loads data of column 0 and column 1 from memory and stores into register F1 and F2. At Time 2, the data of registers is shift to left. The data of column 0 and column 1 is stored into register D and register E. P1M and P2M have to encode register D (column 0), and RG keeps on loading data from memory and storing into register F1

CHAPTER 4. ARCHITECTURE DESIGN

or F2 at Time 2. After P1M and P2M finish encoding register D and RG has loaded data of column 2, the work at Time 2 is finished.

At Time 3, the data of registers is shift to left again. The data of column 0 is stored into register C, and the data of column 1, 2 is stored into register D, E. After P1M and P2M finish coding column 1 in register D and RG stores data of column 3 into register, the work at Time 3 is finished.

At Time 4, the data of column 0 is shift left to register B, and the data of column 1, 2, 3 is shift to register C, D, E. P3M begins working at the time, and it has to code column 0. After P1M and P2M finish coding column 2, and P3M finishes coding column 0, and RG stores data of column 4 into register, the work at Time 4 is finished.

At Time 5, SMW begins working and coding column 0 in register A.

It goes on like this until Time 8. After the work at Time 7 is finished, all data of registers is shift to left except register F2. Note that column 6 is the last column in the first stripe. Since there is no column in right, the right neighbors of column 6 are considered to be insignificant. In other words, if the neighbors fall outside the code-block, they are considered to be insignificant.

0 1 2 3 4 5 6

context window

to be considered as insignificant

Figure 4-18 If the context window is out of code-block, it considers the samples that don’t exist in fact as insignificant.

And it is the same as column 128. Since column 128 is the first column in the second stripe, the left neighbors of column 128 are also considered to be insignificant.

For this reason, the data of column 128 must lag the data of column 6 by one column.

Hence, the data of register F2 (column 128) does not need to shift left into register after the work at Time 7 is finished, but register A,B,C,D,E must shift left. Then, the pipeline is going on with concepts described above until finishing coding a bit-plane.

If the width of a code-block is less than 7, the time for RG loading data of column 128 must be noticed. Figure 4-19 shows the pipeline for an 8 × 6 code-block.

At Time 4, although register F2 is empty, RG could not load the data of column 128, and it must wait until Time 6 for loading memory.

SMW

Time 10 128 129 130 131

128

Figure 4-19 Pipeline architecture of encoding and decoding in special case

CHAPTER 4. ARCHITECTURE DESIGN

Since it needs nine neighbors of the current sample in context window, when loading data of column 128, it also needs to load the data of sample 96. And notice that at Time 4, the column 0 is coding by P3M. If RG loads data of sample 96 from memory, the data has not been updated yet (the significance states of the sample may be changed after three coding passes), and RG will load the error data of sample 96. It is the same at Time 5. In order to get the right data of sample 96, RG must load data after SMW updates memory. Hence, it must wait until SMW finishes work in column 0. And if RG wants to load data of column 129, it also must wait until SMW finishes work in column 1. The relation of position of every column is illustrated in Figure 4-20.

CHAPTER 5. EXPERIMENT RESULTS

The design flow, testing consideration, and experiment results are described in this section.

5.1. Design Flow

We design JPEG2000 EBCOT following the document, ISO/IEC FCD 15444-1:

2000, which is the specification of JPEG2000. The overall cell-based design flow is shown in Figure 5-1.

C model simulation

We use C language to build verification model for simulating and verifying the algorithm. The result generated by our C model is compared with the data of JASPER software to verify the correctness. Software simulation not only verifies the correctness of the proposed algorithm, but also provides the debug information for the hardware design.

RTL code design and simulation

After the architecture is determined from c model, we proceed to RTL (Register Transfer Level) design using VHDL language. After the programming, the RTL codes, together with testbench, are simulated through the ModelSim simulator. Detail debug information from C model can speed up the RTL code design process.

在文檔中具平型化處理之JPEG2000方塊編解碼晶片設計 (頁 54-61)