• 沒有找到結果。

CHAPTER 2 ALGORITHM OF FEC

2.5 S UMMARY

In this chapter, we introduce the encoding and decoding algorithm of each FEC section.

It includes scrambler, interleaving, RS codes and convolutional codes. In chapter 2.1, three kinds of scrambler of J.83 are introduced. In chapter 2.2, both convolutional interleaver and deinterleaver are introduced. It has more advantage than block interleaving. In chapter 2.3, the encoding and decoding algorithm of RS codes is introduced. In RS decoding algorithm, two kinds of key equation solver are presented. One is BM algorithm, and the other is Euclidean algorithm. We also introduce three kinds of RS codes among J.83, one is over GF(27) with t = 3, the others are in GF(28) with t = 8 and 10, respectively. In chapter 2.4, we introduce the convolutional codes and Viterbi algorithm. Fortunately, it has only one mode in J.83, that is, a 16-state non-systematic rate 1/2 encoder with the generator: (G1, G2) = (25, 37octal).

Chapter 3

Algorithm and Architecture for Multi - Mode FEC Decoder

The algorithm and architecture of a multi-mode RS decoder with memories to store and correct the received data and a memory-based universal convolutional interleaver/

de-interleaver will be proposed in this chapter. These two modules are compatible for ITU-T J.83, DVB-T, ATSC Digital TV systems, etc. The scrambler and Viterbi decoder will be only mentioned briefly since the complexity of scrambler is so simple and there is only one kind of convolutional codes.

3.1 The proposed multi-mode FEC decoder

Figure 3.1 shows the block diagram of the proposed multi-mode FEC decoder. It integrates all systems from figure 2.1 into one system. The symbols A/B/C/D represent the annex A/B/C/D in ITU-T J.83. The different data paths between J.83 annex B and annex A/C/D are decided by multiplexer.

mode From De-mapper

out Trellis Decoder &

Synchronization B

Descrambler B

Deinterleaver A/B/C/D

RS Decoder A/B/C/D

Descrambler A/C/D mode

M U X

M U X

Figure 3.1: The proposed multi-mode FEC decoder

3.2 Memory-based universal convolutional interleaver/

de-interleaver

It is not efficient for implementing so many pieces of FIFO in convolutional interleaver or deinterleaver since it consumes lots of power, area and induces routing difficulty in APR (Auto Placement and Route). Hence, a better solution is to use SRAM to solve these problems.

The key issue becomes how to generate the correct address of SRAM for each input and output data. As a result, a novel, low complexity, high flexibility and memory-based method to implement the multi-mode convolutional interleaver and deinterleaver is proposed, which is induced from [6][7].

3.2.1 The algorithm and architecture of memory-based universal convolutional interleaving

The idea is that we rebuilt the FIFO registers of convolutional deinterleaver as a memory array. Assume the FIFO registers in first branch are put in somewhere of the memory array, and the FIFO registers in second branch are appended latter, and so on, until the last FIFO registers are appended. Hence, the memory array is as shown in figure 3.2. For writing, we realize that after writing first symbol into the head of the memory array, the next symbol should be written into the head of the second branch, i.e., the address distance of memory between first symbol and second symbol is (I-1) x J. The values are the same as the numbers of the FIFO in first branch. Hence, we call this “branch address”. And the address for first symbol is called intra-initial address. For the third symbol, the address distance of memory between second symbol and third symbol is (I-2) x J. And so on, the address distance of memory between (I-2)-th symbol and (I-1)-th symbol is 2J. In contrast to write, the first readout symbol should be in the end of the first branch in memory array. The second readout symbol should be in the end of the second branch, i.e., the address distance between first symbol and second symbol is (I-2) x J. Similarly, the address distance between second symbol and third symbol is (I-3) x J. And so on, the address distance between (I-2)-th symbol and (I-1)-th symbol is J. For the coincidence of writing and reading direction, the initial address pointer should be decreased by 1 for the next I symbols. Then, do the previous operation again. In addition, the memory size should be defined. If the memory address is out of the memory size, it should modulo the address by the memory size.

(I - 1) * J (I - 2) * J

. . .

2J J

. . . . . .

Write ReadWrite Read Write ReadWrite

Read J J ... J

. . .

J J J

0

I-1 I-2 I-3 De-Interleaver

J ... J 1

Figure 3.2: The memory array by rebuilding the FIFO registers of deinterleaver

A (12, 17) convolutional deinterleaver which is adopted in ITU-T J.83A, C and DVB-T system will be taken for an example to show how it works. Assume the datum we received are 0, x, x, …, x, 12, x, x, …, x, 204, 1, x, x, …, x, 2244, 2041, …, 11, …, as shown in figure 2.4.

Where the number means the input indexes from interleaver, and x means “don’t care symbols” at the beginning. When deinterleaving, after writing 0 to memory, the interval between 0 and the next writing address is (I-1) x J = 187 as shown in figure 3.3(a). The interval between previous address and the next address is (I-2) x J = 170, and so on, until to 2J

= 34. These numbers are the same as the numbers of FIFO on branches of convolutional deinterleaver. When writing 12 to the memory, it needs go back to the address of “initial writing address-1” and does the previous operation again. After writing 202 into the memory, the data stored in memory is like in figure 3.3(b). Then we can see that the distance between 0 and 1 is (I-2) x J = 170. The distance between 1 and 2 is (I-3) x J = 153, and so on. The distance between 9 and 10 is J = 17. At this time, the memory size in figure 3.3(b) is J x I x (I-1) / 2, just the same as the minimum memory requirement infigure 2.3. Because there is no more space to write 2244 into memory, so it must increase more memory sizes. Or it will violate the rules. By the observation, it needs more J memory size. As shown in figure 3.3(c),

when 0 is read out from memory, 2244 is written into memory. And, 1 is read out, 2041 is written to the original position of 0. Then, do the previous operation again. In addition, when the address is out of the memory size, it must modulo the address by the memory size. Hence, the required memory size is J x I x (I-1) / 2 + J. The maximum size is 65032 bytes for (128, 8) convolutional deinterleaver in J.83B. We realize that it just needs more 8 bytes than the original structure and has the advantage of low cost and high flexibility for multi-mode design.

0 12 24

...

. . .

X X X X

204 1

187

. . .

187

(I-1)xJ

170 (I-2)xJ

(a)

187 170 17

0 12 24

... 36

2208 2220

2232 20292017 2005 ... 25 13 1

. . .

202 ... 22

(I-1)xJ (I-2)xJ J

10 9

(b)

2244

187 170 17

0

...

12

2220

2232 2029 2017

...

13 1

. . .

9 202

...

22 10

(c)

Figure 3.3: Behavior of the novel algorithm for (12, 17) convolutional deinterleaver

The detail operations of universal convolutional deinterleaver are described as pseudo codes in figure 3.4, where there are 12 parameters that we used:

(1) I: Interleaver depth

(2) J: The difference delays between each neighboring branch (3) in: data input.

(4) out: data output.

(5) w_addr: The writing address for memory input.

(6) r_addr: The reading address from memory to output (7) w_ini_addr: The intra-initial address of w_addr.

(8) r_ini_addr: The intra-initial address of r_addr.

(9) branch_addr: This is the address between 2 neighboring data.

(10) counter: For determining when to output directly and reset w_addr and r_addr.

(11) mem_bound: Maximum size of memory

(12) mem[ ]: It represent the memory and the size is mem_bound.

Convolutional interleaver which is the inverse of convolutional deinterleaver can be easily formulated, too.

Initial condition:

w_addr = w_ini_addr = 0; branch_addr = (I-1)*J;

r_addr = r_ini_addr = (I-1)*J; counter = 1;

mem_bound = J*I*(I-1)/2 + J;

While ( in != NULL) {

if (counter == I ) /* In last branch, input will pass to output directly*/

{

out = in;

branch_addr = (I-1)*J; /* branch_addr goes back to initial condition */

counter = 1; /* reset the counter */

w_ini_addr = w_ini_addr - 1; /* reset the writing and reading address */

r_ini_addr = r_ini_addr - 1;

w_ini_addr = w_ini_addr mod mem_bound; /* mod the address */

r_ini_adr = r_ini_addr mod mem_bound;

w_addr = w_ini_addr; /* set writing address to w_ini_addr */

r_addr = r_ini_addr;

} else {

out = mem[r_addr]; /* read out from memory */

mem[w_addr] = in; /* write data into memory */

w_addr = w_addr + branch_addr;

branch_addr = branch_addr - J;

r_addr = r_addr + branch_addr;

w_addr = w_addr mod mem_bound; /* mod the address */

r_addr = r_addr mod mem_bound;

counter = counter + 1;

} }

Figure 3.4: Pseudo codes of universal convolutional deinterleaver

The architecture of the proposed algorithm for convolutional interleaving is depicted in figure 3.5. FSM controls the branch address generator and the intra-initial address generator.

Combining the branch address and intra-initial address together forms the final address for memory.

Write intra-initial

address

D FF

Final write address

SRAM FSM

M U X mod

Write branch address Read branch address Read intra-initial

address

mod

M U X

D FF

Final read address

Figure 3.5: The architecture of the address generator for convolutional interleaving

3.3 The multi-mode RS decoder

To design a multi-mode RS decoder, at first, a finite field multiplier (FFM) for different finite field definition should be designed. Then, the four steps of RS decoding process [4] can be proceeded. As a result, in the sub-section, the multi-mode FFM will be proposed in the first.

Then, the multi-mode syndrome calculator, key equation solver, chien search and error value evaluator will be proposed, respectively. The multi-mode RS decoder can be used in many applications, such as ITU-T J.83, DVB-T systems, etc.

3.3.1 Multi-Mode Finite Field Multiplier

For different RS codes, the different primitive polynomial will cause a challenge to design a FFM. However, FFM can be split into multiply and modular operation respectively.

The primitive polynomial only has an impact on modular operation. Therefore, the complexity of programmable design just lies in the modular operation. So, a multi-mode FFM is proposed as shown in figure 3.6, where pi(x) and pj(x) are different primitive polynomial over GF(2m) respectively.

Figure 3.6: Multi-mode FFM over GF(2m)

3.3.2 Syndrome Calculator

To calculate the syndromes, we can use Horner’s Rule:

0

+

Rj Si

SC

i

i

α8

×

(a)

+

mux

mode

Rj Si

SC2

i

i

α8

×

i

α7

×

(b)

SC0 SC21 SC22 SC23 SC24

SC25 SC26 SC7 SC8 SC9

SC10 SC11 SC12 SC13 SC14

SC15 SC16 SC17 SC18 SC19

20×8 Registers

i

Si Rj

mode

(c)

Figure 3.7: Multi-mode syndrome calculator: (a) Basic cell SCi for GF(28). (b) Basic cell SC2i for dual mode purpose (GF(28) and GF(27)). (c) The overall structure of multi-mode

syndrome calculator

Hence, the basic cell to calculate the syndrome based on Horner’s Rule should be proposed at first. In the simulation platform of J.83, there are two kinds of finite field, the one is GF(28), the other is GF(27). Besides, the roots of the generator polynomial are from α0 to α2t-1 in J.83A, C and D. But in J.83B, the roots of the generator polynomial are from α1 to α2t-1. Hence, the two kinds of different basic cells SCi and SC2i are proposed as shown in figure 3.7(a)and (b). SCi is for GF(28); SC2i is for GF(28) and GF(27) which are decided by

the current mode. The architecture of multi-mode syndrome calculator is shown in figure 3.7(c). For different specification, a specific group of cells will be chosen. For J.83 A and C, SC0, SC1, …, SC15 will be chosen. SC21, SC22,…, SC26 will be chosen in J.83B. All basic cells will be chosen for J.83D.

Based on [9], moreover, the first t syndromes are equal to zeros implies all syndromes are zeros, which can simplify the error detection procedure. It not only improves the power consumption, but also reduces the complexity.

3.3.3 Key Equation Solver

To solve the key equation, Berlekamp-Massey algorithm is used due to its regular operation. For different t, it needs 2t iterations to find error locator polynomial σ(x). Base on the proposed multi-mode FFM and modified decomposed algorithm [4][9] mentioned in chapter 2.3.2, the architecture of multi-mode key equation solver is proposed as shown in figure 3.8. The computation of Ω(x) after σ(x) results in fewer multiplications and additions than the original BM algorithm. It includes only one key equation solver with three proposed multi-mode FFMs to calculate σ(x) and Ω(x) respectively. Hence, the hardware complexity is reduced.

σ(x) S

i

+ δ

τ(x)

+

mux

FF M FF M FFM

Figure 3.8: Multi-mode key equation solver

3.3.4 Chien Search

Similar to syndrome calculator, for the different finite field (GF(27) and GF(28)) and the capability of error correction t, the two kinds of basic cells Ci and C2i are proposed for multi-mode chien search as shown in figure 3.9(a) and (b). Ci is designed only for GF(28). C2i

is designed for GF(28) and GF(27). And the architecture of multi-mode chien search is depicted in figure 3.9(c). For different specifications, the sums of proper cells will be chosen.

The sums of C20, C21, C22 and C23 are chosen for J.83B. The sums of C20, C21, C22, C23, C4 , C5, …, C8 are chosen for J.83A and C. The sums of C20, C21, C22, C23, C4 , C5, …, C10 are chosen for J.83D. And the cell of C2L calculates the current calculating location. If the sums are equal to zero, the location will be stored in the registers.

p

i

Figure 3.9: Multi-mode chien search. (a) Basic cell Ci for GF(28). (b) Basic cell C2i for dual mode purpose (GF(28) and GF(27)). (c) The overall structure of multi-mode chien search.

3.3.5 Error Value Evaluator

Based on Forney algorithm and assume βj is the j-th root of error locator polynomial. For J.83A, C and D, the error value:

For J.83B, the error value:

) (

) (

j j

ei

β σ

β

= Ω (3.3)

Based on the previous equations, the architecture of multi-mode error value evaluator is proposed as shown in figure 3.10. It will calculate σ’(βj) and Ω(βj) at the same time while the left multiplexer will choose βj2 , the bottom multiplexer will choose βj. After calculating σ’(βj), σ’(βj) will multiply βj for J.83A,C and D. The block of “( )-1” is implemented by a table. In order to calculate the final error value, the bottom multiplexer will choose the upper path.

+

mux

βj

βj2

1

σ2k+1 ( )-1

+

mux

k

βj FFM

FFM

Figure 3.10: Multi-mode error value evaluator

3.3.6 Memory structure to correct the RS codeword

Based on the proposed architecture, the memory requirement is four times the codeword length because of the output latency. And, because of the output latency, memory structure is built as two interleaved structure to avoid accessing the same bank of memory in writing the current RS codeword and correcting the previous RS codeword at the same time, as shown in figure 3.11. The interleaved structure of memory is that packet 0 of RS codeword is written into bank 0 of memory, packet 1 is written into bank 1, packet 2 is written into bank 0, and

packet 3 is written into bank 1. Due to the output latency, we will know the error location and error value of RS codeword 0 until writing packet 3 into the memory. When correcting packet 1 in bank 1, the packet 4 is written into bank 0, and so on. Hence, it avoids accessing the same memory bank at the same time. Based on this interleaved structure of memory, the memory requirement for multi-mode RS decoder is 752 bytes (two 188x2 bytes) since the maximum K is 188.

Packet 0 Packet 2 Packet 1 Packet 3 Bank 0

Bank 1

Packet 0 Packet 2 Packet 1 Packet 3

Correcting Time for Packet 0

Packet 2 Packet 1 Packet 3 Packet 4

Correcting Time for Packet 1

Begin to read out packet 0

Begin to read out packet 1

Figure 3.11: The operation of accessing memory in multi-mode RS decoder

3.4 Other Components

3.4.1 De-scrambler

The circuit complexity of scrambler is so simple that it is suggested to use dedicated hardware for different annexes. Because of the property of “serial in serial out” in J.83A and C, we can transform the structure of scrambler into the one as shown in figure 3.12. The

transformed structure has the property of “symbol in symbol out”. Hence, the serial to parallel converter or parallel to serial converter can be omitted. The other annexes can be implemented as the original structure.

1

Enable Initialization

Sequence

2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 0 0 1 0 1 0 1 0 0 0 0 0 0 0

Data output

Data Input (B8) XXXXXXXX

HEX

Every cycle, it shifts 8 bits

8 bits

8 bits 8 bits

Figure 3.12: The transformed structure of scrambler in J.83A and C

3.4.2 Viterbi Decoder

Trellis coding is only included in J.83B. Using hard-decision Viterbi decoder with 16 states can fit the requirement of ITU-T J.83B.

We take register-exchange method as the architecture of survivor path storage management to realize the Trellis decoder since the convolutional codes in J.83B has only 16 states and thus the number of registers required for this decoder is not quite large. According to this approach, we assign one register to each state. Each register records the decoded output sequence along the survivor path for each state, as shown in figure 3.13 [16]. The decoded output sequence stored in survival memory (SM) depends on the path of minimum sum of the

coming TM and the previous PM. At the last stage, we select the sequence content stored in the register of the state with minimum PM.

0

Figure 3.13: Register contents for register-exchange method

S0

Figure 3.14: Architecture of register-exchange approach applied in SM unit. (a) Trellis diagram. (b) The connections of registers and multiplexers between each state.

The implementation of register-exchange method is really simple. The connections of registers and multiplexers between each state are decided by Trellis diagram. Using the property of the structure of trellis diagram as shown in figure 3.14(a), it is shown that there are always two states having the same previous two states. And, the current SM will select the

minimum path of sum of the coming TM and the previous PM as the new decoded output sequence. Thus, decoding sequence of S0 and S1 must come from S0 and S2, and the connections can be represented by figure 3.14(b) [17].

The other issue in Viterbi decoder is metric rescaling. In Viterbi algorithm, the path metric is unboundedly increasing as time goes by. To implement a trellis decoder, we have to limit the path metric within a finite numerical range so that it can be expressed with finite bits.

There are several approaches to do rescaling, such as “Reset”, “Rescaling Subtraction”,

“Shift”, and “Modulo Normalization”. Among these approaches, “Modulo Normalization”, witch is also called “Two’s Complement Arithmetic Approach” [18], is much more efficient than the other approaches and can be implemented by two’s complement arithmetic.

L PM

0

PM

1

PM

2

PM

s1

PM

s2

t=0 t=k-L t=k

Figure 3.15: The upper bound of PM difference

Before describing how two’s complement arithmetic approach works, we should know the upper bound of PM difference at first. Assume all survivor paths selected at time unit k come from the same state at time unit “k-L” as shown in figure 3.15. Then, the difference between any two PM must less than B x L, where B and L are maximum value of TM and truncation length respectively.

The key idea of the “Modulo Normalization” approach is not to avoid overflow, but to accommodate overflow. Even the overflow occurs; the PM differences are also preserved.

This concept can be represented by figure 3.16. Suppose both M1 and M2 are positive real number and M1M2 <2c1, where c is the bit number of PM value, then m1 = M1 mod 2c and m2 = M2 mod 2c. Thus, m1 and m2 can be presented on half cycle without confusing their difference relationship. If m1 − m2 ≥0, then M1 > M2 and we can select the suitable survivor path. Note that both the add operation “PMnew = (PM+TM) mod 2c” and subtract operation

“ ” can be realized with 2’s complement components. In principle, we achieve metric rescaling at the cost of one-bit penalty. However, such a method can avoid redundant rescaling operations or performance degradation due to metric overflow.

2

1 m

m

0

1

-2c-1 -1

increase

decrease 2c-1-1

m1 m2

α

Figure 3.16: Illustration of Modulo Normalization

After summarizing the architecture of register-exchange and the “Modulo Normalization” approaches, we can implement the trellis decoder by combining the following components:

(1) TM: Compute all branch metrics from the received symbols.

(2) ACS: Perform the “add-compare-select” operation for each state to update their path metrics, respectively. The block diagram is shown in figure 3.17.

(3) Metric Rescaling Unit: Confine all PM values to a finite range without losing their difference relationship.

(4) SM: Record all decision result according to the choice of ACS unit and trace back the survivor path to find the oldest data as decoded bits. The block diagram is shown in figure 3.18.

TM :Transition Metric ACS:Add-Compare-Select PM :Path Metric

SM :Survivor Memory

ACS(Add-Compare-Select)

TM :Transition Metric ACS:Add-Compare-Select PM :Path Metric

SM :Survivor Memory

ACS(Add-Compare-Select)

Figure 3.17: The ACS module used for Viterbi decoder

...

Figure 3.18: Survivor memory and trace back unit

The overall implementation architecture of Viterbi decoding algorithm is shown in figure

3.19. Note that although the hard decision-making method is applied here, the soft demodulator decisions can result in a performance advantage over hard decision decoding, so the TM units and Metric Rescaling units could be re-designed according to what kind of

3.19. Note that although the hard decision-making method is applied here, the soft demodulator decisions can result in a performance advantage over hard decision decoding, so the TM units and Metric Rescaling units could be re-designed according to what kind of