Due to the imperfectness of channels, errors would often occur during transmission.
Therefore, some measurements must be taken to combat the disturbance and to raise the reliability of communication systems. One practical option to deal with the problem is called error-control coding or channel coding. The channel encoder in the transmitter accepts message bits and adds redundancy according to a prescribed rule, thereby producing encoded data at a higher data rate. The channel decoder, on the other hand, exploits the redundancy to decide which message bits were actually transmitted and find out the proper estimation. In fact, there are many different error-correcting codes developed for us to use. In this work, we will focus on convolutional codes.
Convolutional coding has been used in communication systems including deep space communications and wireless communications. It offers an alternative to block code for transmission over a noisy channel. Convolutional coding can be applied to a continuous input stream (which cannot be done with block codes), as well as blocks of data. In fact, a convolutional encoder can be viewed as a finite state machine. It generates a coded output data stream from an input data stream. The encoder structure is usually composed of shift registers and a network of XOR gates (Modulo-2 adders) as shown in Figure 2.1.
In convolutional coding with rate k/n, the encoder accepts k-bit input symbol and generates n-bit output symbol by an operation which may be viewed as the discrete-time convolution of the input sequence with the impulse response of the encoder. The duration of the impulse response equals the registers of the encoder.
Chapter 2 4/40
Figure 2.1 Block diagram of rate 1/2-Convolutional encoder
Take a 1/2 code for example in Figure 2.1. As we can see, two shift registers and modulo-2 adders compose the entire encoder scheme. In the example, the encoder accepts one-bit input symbol and then interact with contents stored in the shift registers to produce two-bit output symbol. A convolutional encoder is also generally characterized in (n, k, v) format, where
n is number of bits of output symbol of the encoder ; k is number of bits of input symbol of the encoder ;
v is number of memory elements of the longest shift register of the encoder.
Therefore, the encoder shown in the figure above could be describe as a (2,1,2) encoder with constraint length K = v+1 = 3 .
Figure 2.2 Block diagram of rate 2/3-Convolutional encoder
A more complex encoder structure is provided in Figure 2.2. Describe in (n,k,v) format is (3,2,4). However, this description could not fully reflect the connections in the encoder. Hence, generator polynomial is developed to characterize each paths and
Chapter 2 5/40
connections in the encoder. To be specific, the generator polynomial of the ith path is defined by
where D denotes the unit-delay variable. In normal cases, the coefficients equal 0 or 1 representative of the connections.
However, the short constraint length recursive systematic convolutional (RSC) [10]
codes used in turbo codes are not the case. The coefficients of generator polynomial of RSC will be fractional numbers instead of 0 or 1. Figure 2.3 is the example of recursive systematic polynomial whose generator matrix is described as below.
⎥⎦
Figure 2.3 Example of recursive systematic convolutional (RSC) encoder
The reason for making the convolutional codes recursive (i.e., feeding one or more of the tap outputs back to the input) is to make the internal state of the shift register depend on past outputs. This affects the behavior of the error patterns whose characteristics and corresponding decoder structure is beyond the discussion of the thesis. In order to distinguish the normal case of convolutional codes to RSC, some names the normal structures as Feedforward Convolutional Codes. In this thesis, we
exclude RSC and only discuss feedforward convolutional codes. From here on,
without specification, we refer to convolutional codes as feedforward convolutional
codes.
Chapter 2 6/40
2.2 Viterbi Decoder
The decoding process could be viewed as the reverse of the encoding process.
Often, a trellis diagram will be introduced to help decode since it brings out how the input symbol operates with the contents of shift registers to produce the output symbol.
In Figure 2.4, the trellis diagram of a rate 1/2-convolutional code as in Figure 2.1 is displayed. The four states on the trellis represent the possible contents of the registers.
For general case in digital system, the number of states equals N=2v. With the trellis diagram, what a decoder has to do is to find the closest match between received signal sequence and the estimated sequence. The most instinctive way to realize the task is brutal search, which is to list all the possible signal sequence combinations and then compare them with the received sequence. At last, take the maximum likely (ML) one as the estimation of the transmitted signal sequence.
Figure 2.4 Trellis diagram for rate 1/2-convolutional code
However, brutal search is never an efficient method. Thus, Viterbi Algorithm was proposed to eliminate the unnecessary comparisons made in the decoding process. Let
m denote a message vector and c denote the corresponding code vector applied by the
encoder to the input of a discrete memoryless channel. Let r denote the received vector, which may differ from the transmitted code vector due to channel noise. Given the received vector r, the decoder is required to make an estimate of the message vector. Since there is only one-to-one correspondence between the message vector m and the code vector c, the decoder may equivalently produce an estimate of the code vector. Thus, the decoding rule is to choose the estimate of code vector, given the received vector r, minimizes the probability of decoding error. The maximum likelihood decoder or decision rule is described as follow:Choose the estimate cˆ for which the log-likelihood function log
p
(r
|c
) is maximum.Chapter 2 7/40
For the binary symmetric channel, the maximum-likelihood decoder reduces to a minimum distance decoder. In such a decoder, the received vector r is compared with each possible transmitted code vector c, and the particular one closest to r is chosen as the correct transmitted code vector. As for the channel with memory, special cares have to be taken in the calculation of likelihood function such as soft decoders [11].
However, no matter which design is chosen, Viterbi algorithm will be applied.
The VA recursively finds the most likely path by using a fundamental principle of optimality first introduced by Bellman [12] which we cite here for reference:
The Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.
In the present context of Viterbi decoding, we make use of this principle as follows.
If we start accumulating branch metrics along the paths through the trellis, the following observation holds: Whenever two paths merge in one state, only the most likely path (the best path or the survivor path) needs to be retained, since for all possible extensions to these paths, the path which is currently better will always stay better: For any given extension to the paths, both paths are extended by the same branch metrics. This process is described by the add-compare-select (ACS) recursion.
The path with the best path metric leading to every state is determined recursively for every step in the trellis. A better elucidation can be made in mathematical expressions.
k and i serve only as index here. The metrics of survivor paths for state
x
k = ati
trellis step k are called state metricsγ
i,k. In order to determine the state metricγ
i,k, we calculate the path metrics for the path leading to statex
k = by adding the statei
metrics of the predecessor states and the corresponding branch metrics. The predecessor statex
k−1 =i
for one branch m of the M possible branches where{
0,1,... −1}
leading to states
i,k. The state metric is then determined by selecting the best path{
(0,) (1,) ( 1,0)}
,k = k i , k i ,..., kM−
i Maxγ γ γ
γ
A sample ACS recursion for one state and M =2 is shown in Figure 2.5.
Chapter 2 8/40
Figure 2.5 Illustration of ACS recursion for M=2
Despite the recursive computation, there are still N=2v best paths pursued by the VA. The maximum likelihood path corresponding to the estimated sequence can be finally determined only after reaching the last state in the trellis. In order to finally retrieve this path and the corresponding sequence of information symbols, either the sequences of information symbols or the sequences of ACS decisions corresponding to each of the N survivor paths for all states i and all trellis steps k have to be stored in the survivor memory unit (SMU) while calculating the ACS recursion. The detailed description of SMU will be provided later in Section 2.4.
So far, we considered only the case that the trellis diagram is terminated, i.e. the start and end states are known. If the trellis is terminated, a final decision on the overall best path is possible only at the very end of the trellis. The decoding latency for the VA is then proportional to the length of the trellis. Additionally, the size of the memory element in SMU grows linearly with the length of the trellis. Finally, in applications like broadcasting, a continuous sequence of information bits has to be decoded rather than a terminated sequence, i.e. no known start and end state exists.
That is to say that the required length of memory length in SMU has to be at least equal to the length of entire signal sequence. This is wasteful and unpractical in the hardware implementation.
Fortunately, through theoretical demonstration and simulations by many researchers, they find that a value of 4 or 5 times the code constraint length K is sufficient for negligible degradation from optimal performance of the decoder. Since with sufficient length of sequence, common tails as in Figure 2.6 will appear. Even though we do not make our decision at the final point, the path chosen will still converge to the optimal path in the front end of the path. Thus, the decoded sequence could be output sooner than traditional VA and the length of memory element will reduce to only 4 or 5 times the constraint length.
Chapter 2 9/40
Figure 2.6 The generation of common tail
Figure 2.7 The entire Viterbi Decoder block diagram
Summarizing, the basic units of a Viterbi Decoder are shown in Figure 2.7. The branch metrics are calculated from the received symbols in the Branch Metric Generator (BMG). These branch metrics are fed into the add-compare-select unit (ACS), which performs the ACS recursion for all states. The decisions generated in the ACS unit are stored and retrieved in the Survivor Memory Unit (SMU) in order to finally decode the source bits at decision unit (DU) along the final survivor path.
2.4 Survivor Memory Unit (SMU)
As we mentioned in the Chapter 1, there are many modified ACS methods invented for the improvements of decoding speed. The price they paid is the need of extra hardware. However, with an expectation of improvement in both decoding efficiency and hardware complexity, we turn our target to survivor memory unit. Thus, in this section, three conventional methods of SMU realization will be provided as basis for the later research. These three methods are traceback management (TBM), register exchange algorithm (REA) and the Hybrid method.
2.4.1 Traceback Management (TBM)
In the traceback management, the decision value vectors di [7, 13] and branch label vectors ui (Or previous states), the outputs of ACS, of the S most recent trellis stages are stored. As was explained earlier, in principle all paths that are associated with the
Chapter 2 10/40
trellis states at a certain time step k have to be reconstructed until they all have merged to find the final survivor path and thus decode the information. However, in practice only one path is reconstructed and the associated information at trellis step k-D output. D must be chosen such that all paths have merged with sufficiently high probability. If D is chosen too small, substantial performance degradations result. As mentioned, the survivor depth D equal to 4 to 5 time constraint length is appropriate for the appearance of common tail. Thus, S has be chosen larger than D.
The decoding is then performed by starting at a certain time step k and the tracing back D steps for the assurance of appearance of the merging path. The traceback is then continued from this state and the previous state labels are read out and then to decode out the original data.
The above explanation could be illustrated much better in the Figure 2.8. Here, S is chosen as D+H. H is claimed as the decoding depth. From step k-(D+H-1), S steps of ACS are realized and the decision value vectors {dk-(D+H-1),….,dk}and branch label vectors{uk-(D+H-1),….,uk} are stored, where dk and uk separately consist of N values denoting as {d0,k,...,dN−1,k} and {uk[0],...,uk[N−1]}. At step k, the decoder traces back D steps reaching the merging path. Then, consecutive H branch labels are read out according to the corresponding decision values to be decoded.
[0]
Formally, TBA could be stated in a form similar as C language as follows:
Memory:
(D +H) *N decision bits {dk-(D+H-1),….,dk} stand for upper or lower path Algorithm:
// every H trellis steps a trace back is started if (k-D can be divided by H) then {
Chapter 2 11/40
// Initialization
traceState := startState ; // Acquisition
for t=k downto k-D+1 {
traceState := Z(dtraceState,t ,traceState) ; }
// Decoding
for t=k-D downto k-D-M+1 {
decode bit vector := u(dtraceState,t ,traceState) ; traceState := Z(dtraceState,t,traceState) ;}
}
Figure 2.9 Different TB memory contents (a) previous states (b) decision values
Chapter 2 12/40
An example of TBM storing previous states or decision values according to a certain trellis flow is provided above in Figure 2.9, which will serve as a common example for the later comparisons. As seen in the figure, we define the decision value 0 representing the upper path and 1 for the lower path.
Before leaving TBM for REA, we have to point out that TBM could be realized
with small power consumption and circuit area since there are few logic elements needed. However, it suffers long latency since the decoder has to read in the signal sequence of sufficient length and then feed it into ACS unit. Afterwards, traceback has to be realized for the final decoded output. Therefore, the biggest drawback of TBM is its large latency as well as low decoding efficiency.2.4.2 Register Exchange Algorithm (REA)
The major reason for the development of REA is to improve the drawback of large latency of TBM. Its goal is that the decoded output could be ready right after the ACS of a sufficient length of signals is done. In the register exchange algorithm, survivor paths are stored in N shift registers, each of length S ≧ D. The connection of the multiplexers and registers is derived from the trellis diagram which is constructed by repeating one row containing all states of the code several times to represent consecutive time-steps. By updating the entire contents of every shift register during every decoding cycle (one decoding cycle corresponds to processing one trellis stage), each shift register i always holds the survivor sequence for state i. The decoded data can be obtained from the output of the shift registers.
In order to illustrate REA, new parameters should be introduced. We denotes branch labels associated with the path belonging to state i at trellis step k as
u)
k[i], with the hat to distinguish from TBM. The branch label associated with the mth branch merging into state i as u(m,i). Thus, we can formally state the algorithm as follows:Memory:
(D +1) *N branch labels (
u) ,…,
[ik]u
)[ik−]D) Algorithm:// Update of the stored symbol sequences according to
// the current decision bits di,k (decision bit of state i at step k) for t=k-D to k-1 { for State=0 to N-1 {
] [ State
u)
k =u)t[+Z1(dstate,k,State)] ;}Chapter 2 13/40
}
// setting the first information symbol of the path for State=0 to N-1 {
] [ State
u)
k =u
[(dState,k,State)]; }The corresponding hardware implementation of REA is provided as below.
[0]
REA successfully reduces the decoding latency. However, when the constraint length increases, REA becomes critical in terms of area and power dissipation. The register exchange algorithm needs the same number of multiplexers and registers as the number of states (N) multiplied by the survivor path length (S) and they are activated every cycle to update data in memory. Thus, REA is mostly applied if latency or total memory size is critical.
2.4.3 The Hybrid method
REA is a direct implementation whose critical path consists of one multiplexers and one latch, thus allowing high data throughput. However, area and power consumption rapidly become a critical concern as constraint length grows. TBM, based on RAM, achieve low power and small circuit area but leads to high latency and low throughput.
These drawbacks motivated the former researchers to find the balance point. The hybrid method [13,14] was thus developed.
Chapter 2 14/40
The underlying idea of the Hybrid method is as follows. A continuous exchange registers carried out the whole surviving path leads to unacceptable area and power dissipation. It is nevertheless possible to perform a partial REA that generates segments of survivor paths. After results of partial REA are produced, theses segments of survivor paths are stored in a memory space till sequence of survivor path length S is processed. The fast TBM can then be executed since only a few steps of traceback are required. By doing so, the reduced REA becomes acceptable in area and power consumption and high latency in TBM could be improved.
An example is given below in Figure 2.11. The survivor path length (S) is determined as 8, however, partial REA of length 4 is applied. Therefore, the results generated at t=4 and t=8 should be saved as well as the pointer of the previous states.
Until t=8, the fast TBM could be realized with only one-step traceback. Detailed computation of the required hardware and latency will be given in Chapter 3.
Figure 2.11 Example of Hybrid method
Chapter 2 15/40
2.5 Summary of the Chapter
In this chapter, convolutional encoders and Viterbi decoders are introduced. We put more emphasis on the architecture of Viterbi decoders, especially survivor memory unit (SMU). Three conventional methods used to implement SMU are illustrated.
TBM has the drawbacks of low throughput and high latency. REA requires larger area and power for multiplexers and exchange registers. These flaws in the two methods motivate us to research for the improved techniques. Even though the Hybrid method had attempted to balance the area/efficiency trade-off, we believe that the hardware could be further reduced. Thus, three improved methods corresponding to the three conventional implementation techniques will be the highlight of the next chapter.
Chapter 3 16/40
Chapter 3
Three Improved SMU Designs
3.1 Overview
As mentioned in Chapter 2, the existing methods in implementing SMU such as TBM and REA fall victims to different flaws. Thus, in the Chapter 3, we aim to derive the new designs that could improve low decoding efficiency of TBM and reduce some memory space as well. On the other hand, for REA and the Hybrid method, the amelioration will be mainly made in the reduction of hardware requirements. These improved SMU designs are all originated from a concept that was aroused by an interesting observation.
3.2 The observation
A feedforward convolutional coder is the encoder structure with no feedback loops. As in Figure 2.1, no feedback lines are drawn back to any of the registers. By examining all the possible inputs and register states, the corresponding trellis diagram of the coder could be generated as in Figure 2.4. With trellis diagram, the VA decoder could follow the prescribed paths to finish Add-Compare-Select (ACS) and then choose one ML path as the final estimation.
When a path is chosen at last, trace-back has to be done to read out the estimated input sequence. It is a process of mapping from present states to previous states on the trellis diagram and then decides what input symbol the segment represents. In most cases, only one step of trellis diagram will be provided and used in the trace-back process, since pattern of every step in trellis is actually the same.
However, when observation is made on a two-step trellis for a (2,1,2) code as in Figure 3.1, one interesting phenomena is found.
For convenience of better explanation, we take the rightmost step as the present step and the steps followed to the left are the past steps. When decoding, we used to and have to know states of both the present and the previous steps to decode out only one input symbol. However, by observing the two-step trellis diagram, we find that even though the previous states are unknown. The decoded output is certain for
For convenience of better explanation, we take the rightmost step as the present step and the steps followed to the left are the past steps. When decoding, we used to and have to know states of both the present and the previous steps to decode out only one input symbol. However, by observing the two-step trellis diagram, we find that even though the previous states are unknown. The decoded output is certain for