Chapter 3 SBR Range Decision
3.2 Error Concealment
This method determines SBR range depending on bit rates and sampling rates.
The SBR range is fixed among frames. According to bit rates, the burden allocation between AAC encoder and SBR encoder is fine in most frames. The occasional spectral distortion is handled by error concealment mechanism [14][15][16][17] in SBR decoder. This approach leads high coding efficiency, and error concealment mechanism can compensate the lack of flexibility. Further, due to the help or error concealment, the bandwidth of reconstructed signal can be aggressively extended.
Chapter 4
Related Work for Time/Frequency Grid
This chapter introduces the design of T/F grid in 3GPP HE-AAC encoder [15].
The block diagram of 3GPP HE-AAC encoder is illustrated in Figure 17. T/F grid decision contains transient detector, frame splitter, and T/F grid generator. Transient detector detects the start position of transient. Frame splitter is only operated in the frame without transient, and it determines this frame separated into two envelopes or not. T/F grid generator receives the information from transient detector and determines the time borders and the related envelope resolution.
64 ch Analysis QMF
Transient detector
Frame splitter
PCM signal
(L/R/M) Tonality
detector
T / F Grid Generator
Additional Control Parameters
Envelope Energy Formatter
Quantiser and T/F Huffman
Encoder
Bitstream Multiplexer
Coded SBR Bitstream
Figure 17: Block diagram of 3GPP HE-AAC encoder [15].
4.1 Time/Frequency Grid Design
In 3GPP SBR encoder, the frequency band table is selected depending on bit rate and sampling rate and it does not alter among frames. Time segments, envelops
resolution and frame classes are determined according to below mechanism.
4.1.1 Transient Detector
Transient detector is the most important module in 3GPP T/F grid. The following frame splitter and T/F grid generator operates according to the information from it. On a word, the SBR coding quality relies on this module.
The objective of transient detector is to determine whether a transient occurs in the present frame and find the position for the on-set of the transient. The output variables of transient detector, tranFlag and tranPos are used for recording the above information.
Transient detector operates on subband samples of one frame length and starts from sample 8. The basic principle of transient detector is to estimate the energy difference among samples, and determines whether a transient exists depending on information of energy difference. At first, calculate the average energy of each subband in the processed frame and then derive the standard deviations. Next, for each subband, calculate the neighbor energy difference of each sample and compare it to the relating standard deviations. If the energy difference is larger then the relating standard deviations, then take down the value which exceeds the standard deviation. For each time samples, the estimation procedure is executed 64 times, and the values indicating
“large energy difference” are added. Finally, check each sample whether one with the energy difference value exceeds the threshold. If it does, then set tranFlag to true, and record the position of this sample. The diagram of transient detector is illustrated in Figure 18.
4.1.2 Frame Splitter
Frame Splitter only operates when transient detector has detected the absence of a transient in the current frame. It decides whether the current frame is split into two envelopes of equal size and uses a variable splitFlag to store the result. The concept of frame splitter similar to transient detector is to estimate the energy difference, but not as precise as transient detector. The estimation unit used in this module is half of frame length. Compare the energies of the two half frame, the variable splitFlag can be determined.
4.1.3 T/F Grid Generator
The T/F grid generator creates the time/frequency grid for one SBR frame. Input parameters are provided by transient detector and frame splitter. Frame class is determined at first. It is accomplished depending on the trailing frame border of last frame (FIX or VAR) and the parameter tranFlag of the present frame. On a word, if there is a transient in the current frame, the trailing frame border is VAR; else, the trailing frame border is FIX. The total combination of the leading frame border and transient is described in Table 1. When most transients are sparse, the FIXVAR-VARFIX pair is used. The current frame is encoded with the FIXVAR portion, and the VARFIX gird is stored for the next frame. If no transient occurs in the next frame, the stored VARFIX grid is used; else, the new calculations are needed for the new transient, and merged with the already calculated grid, whereby, a VARVAR class frame is used.
VARVAR
Table 1: The combination of transient and trailing frame borders.
The positions of time borders in one frame are determined mainly on the position of transient, i.e. the input parameter tranPos provided by transient detector. Each
transient accompanies three main time borders, the first one locates at the position for on-set of the transient, the second one is two timeslots behind the first border, and the third one is four timeslots after the second one. In addition, if the position of transient is not too near the front of the current frame, there will be additional time border ahead the transient. Similarly, additional time borders may be adopted behind the transient when the transient is in the front of the present frame. For Figure 19 as example, the tranPos is 13, and the resulting main time borders locate at 13, 15, and 19. The additional time borders are at 7 and 25 respectively. Consequently, the time borders in the present FIXVAR frame locate at 7, 13, 15, and trailing frame border is at 19. The grid contents of VARFIX portion will be stored for the next frame.
tranFlag = 1 tranPos = 13
|<- Frame n: FIXVAR-:--3->|< Frame n+1: -->|
... QMF slots I-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-Io|o|o|-|-|-|-|-|-|-|-|-|-|-|-|-I SBR slots 0 7 13 15 19 25 32 FG index
I: nominal frame boundaries o: frame overlap region slots
VARFIX
Figure 19: An example for T/F grid generator [15].
4.2 Summary
On a conclusion, the key in the 3GPP T/F grid is the transient detector. Time borders are greatly involved with the position for the on-set of the transient. If there is a transient detected in the present frame, regardless of additional time borders, it creates three time borders at least. The number of time borders can be considered as the required bits. Therefore, the objective of T/F grid is to use the least number of time borders to achieve satisfactory audio quality. In 3GPP T/F grid, some time borders can be removed to economize the consumed bit for the coding efficiency.
However, the most existing SBR encoder adopts the concept of transient detector, including Coding Technology [18] and US 2006/0031065 A1 [19], for T/F grid.
Instead of detecting of position of transient, this thesis analyzes the design issues of
Chapter 5
Efficient Design of Time/Frequency Grid
Time/Frequency Grid decides the format of reconstruction unit in both time and frequency domain. The resolution of reconstruction unit determines the accuracy degree of reconstructed signal and required bits. It is obvious that for stable signal, the format of T/F grid should be “simple” to reduce the required bits. Oppositely, in order to reduce the distortion of reconstructed signal, the format of T/F gird should be dedicated. In addition, if there are more available bits, the resolution of T/F grid can be higher. Consequently, T/F grid decision is greatly involved with bit rate and audio contents. The efficient design of T/F Grid which this thesis proposes emphasize on these two main factors.
The first issue is how to judge a T/F grid assignment is good or poor. This paper introduces a method to measure the reconstruction error of signal objectively. With the analysis of error, collocating different bit rates, the most suitable form of T/F grid can be selected.
T/F grid comprises frequency table decision, time borders distribution, envelope resolution decision and frame class decision. Frequency tables and envelope resolution codetermine the frequency resolution of T/F grid, and time borders distribution and frame class are responsible for time resolution. The reconstruction error measurement is described first, and the designs of other sub-modules are followed.
5.1 Analysis of Reconstructed Error
The process of SBR decoder has been described in Chapter 2. Through simulating the concept of reconstruction in decoder, the corresponding reconstructed error can be estimated. The reconstructed error used in this thesis is defined as error of power spectrum. The estimation process is accomplished as follows:
First, the rescaling gain values can be hypothesized to the power spectrum of original high frequency contents divided by corresponding low frequency ones. It is given by
∈
Where Hi,g represents the power spectrum of high frequency band samples in one
grid unit, and Li,g stands for the corresponding power spectrum of low frequency using G as notation. Consequently, the reconstructed envelope error of T/F grid is estimated by
And (13) is expressed as follows.
( )
[ ( ) ]
=Furthermore, it can be expressed as follows
⋅
It is clear to see that the error is related to the energy of original high frequency contents and the correlation between high frequency bands and replicated low frequency bands.
According to (17), the reconstructed error will be affected more by the high energy samples than small energy ones. Therefore, the T/F grid which is picked out through minimumE
( )
G tends to take care of samples with large energy, and may ignore the others with small energy. However, the distortion of small energy samples is huge, and makes the reconstructed signal sounds noisy. In order to overcome this problem, this paper introduces critical unit to revise the criterion. Critical unit is used for energy normalization and defined as follows, in the time domain, each critical unit contains four samples(two timeslots), and in the frequency domain, the resolution of critical unit is involved with critical band bandwidth, i.e. the critical unit contains fewer subbnads in low frequency and more subbands in high frequency. Instead of minimum error, the objective is the minimum distortion rate, and (13) is revised by∈
Comparing (13) with (18), the measurement unit is magnified form sample to critical unit. In order to reduce the calculation complexity, (18) can be changed into a radical expression
∈
The calculation of (19) is defined as DSR (energy difference to original signal ratio).
Finally, the reconstructed error estimation is given by
[ ( ) ]
= −Due to the frame boundaries can be variable; the number of critical unit in each frame is different. Consequently, (20) is revised by
[ ( ) ]
minimize the averaged DSR.5.2 Frequency Band Table Decision
Frequency band tables determine the resolution of T/F gird in the frequency domain and the precision of tone addition. Hence, frequency band tables dominate the frequency resolution in SBR. The frequency band tables used in SBR include master frequency band tables, high resolution frequency band tables, low resolution frequency band table, noise floor frequency band tables and limiter frequency band tables. All the frequency tables can be built from master frequency band tables.
Consequently, the design issue is the way to select the most suitable master frequency band table.
There are eight different master frequency band tables defined in SBR codec. The
also have great influence on table selection. Therefore, table decision should be considered with bit rate and audio content.
From the aspect of reconstructed error, the most suitable frequency table can be picked out through calculating the relating DSR. However, the method depending on DSR greatly increases calculation complexity and may change frequency table between frames easily. Regardless of complexity, the adaptive method faces shortcomings similar to adaptive SBR range decision mentioned above, which contain SBR header overhead, tone trembling artifact and disable for DPCM in time domain.
Therefore, changing master frequency band table between frames too often consumes more bits and may reduce the coding efficiency. Furthermore, the resolution of frequency tables is not as flexible as time borders. In short, it is not allowed to choose arbitrary subband as frequency band boundary. Consequently, selecting frequency band table by DSR is inappropriate. From the other aspect, the resolution of chosen frequency band table greatly affects the precision of adding tones. To summarize, the frequency band table should be coarse to save bits at most time, and when additional tone components is needed, the higher resolution of frequency band table can be selected according to the information from tone-addition mechanism. In this thesis, we choose the coarsest frequency table for saving consumed bits.
5.3 Time Borders and Envelope Resolution
This sub-module is responsible for determining the time resolution of T/F grid and corresponding envelope resolution. The former contains number of envelopes in one frame and locations of time borders. The latter defines the detailed frequency resolution of each envelop in one frame. According to the constraint of SBR standard, there are four time borders and related five envelopes in one frame at most. With calculating the DSR for each form of T/F grid, the one with the minimum DSR can be selected. If the highest resolution is 4 time samples (2 timeslots), in one frame with 32 time samples, the total combinations of time borders and envelope resolution is given by However, the resulting calculation complexity is very high. In order to simplify the calculation, this thesis proposes an efficient search algorithm through dynamic programming. The notation DSRik,j,uused in the following presents the minimum DSR value for the range from 2i-th timeslot to 2j-th timeslot of the current frame, with k
time borders and u high resolution envelopes. For example,DSR23,,72is illustrated in Figure 20. Hence, the objective is to find theDSR0k,,8u. Furthermore, the DSR with higher number of “k and u” is deducted from the lower ones, i.e.DSRi1, j,1 can be derived from two possible combinations, one isDSRi0,t,0 +DSRt0,,1j , and the other
isDSRi0,t,1 +DSRt0,,j0. The sketch for the deduction of time borders and high resolution envelopes is described in Figure 21.
4 14
Figure 20: Illustration of DSR notation.
k (time borders) u
k (time borders) u
Figure 21: The trellis-lattice deducing path by dynamic programming.
{ }
Through deriving the minimum DSR of each sub-structure, the probable combinations of target structure are greatly reduced. The total combinations of this search algorithm with dynamic programming are
Compared to (12), it is clear to that the complexity is much lower. However, the most time-consumed portion is to derive the DSR of each possible region. In this proposed dynamic programming algorithm, only the initial cases need to be calculated. With the initial DSR, the other DSR of possible regions can be “pieced” out easily.
Consequently, the total calculations needed for DSR are only
(
8 7 6 5 4 3 2 1)
72*
2 + + + + + + + = (26)
In addition, the factor about consumed bits needs to be taken account of into the T/F grid decision. The first issue is how to estimate the consumed bits of each form of T/F grid. In the SBR bit stream, the energies of grid units are quantized and then transmitted. Therefore, the number of grid units within T/F grid can be assumed to present the consumed bits, i.e. more is the number of grid unit, more bits this T/F grid takes. From this aspect, one time border is regarded equivalent as one high resolution envelope, due to the both creating the same number of grid units. Thus, the total amount of time borders and high resolution envelopes can present the degree of bit-consuming.
In order to take consideration for the bit overhead, there are ten bit-consuming stages set in the dynamic programming. Each stage indicates the different degree of
bit-consuming. The relation between these stages and relating number of time borders and high resolution envelopes is described in Table 2. From the lower stages to the higher ones, the T/F grid with the minimum DSR of each stage is derived. If there is one relating DSR value under the threshold, then the search terminates. The flow chart is illustrated in Figure 22.
Table 2: Combinations of bit-consuming stages.
Bit-Overhead Stage = 0 Best = 0
Stage Transformation
Dynamic Programming k <0 or u >2k+1
N Y
Bit-Overhead Stage++
Stage < 9
DSR < threshold
Y N
End Y
N Record this
T/F grid Bit-Overhead Stage = 0
Best = 0 Stage Transformation
Dynamic Programming k <0 or u >2k+1
N Y
Bit-Overhead Stage++
Stage < 9
DSR < threshold
Y N
End Y
N Record this
T/F grid
Figure 22: DP flow chart with quality constraint.
It is clear to see that the resulting performance is greatly involved with the threshold. This threshold is named as “quality threshold” because it stands for the satisfactory reconstructed error. Further, the quality threshold should be different on
adaptive bit rates.
Through the above algorithm, the derived T/F grid presents that the error for this grid format is acceptable. However, the situation that no any T/F grid meets the quality constraint may happen. In such case, the highest resolution T/F grid may not the best solution. Therefore, another threshold is needed. This constraint is referred to as “efficiency threshold” because it restricts efficiency of consumed bits. The new form of T/F grid is adapted only when it improves some percentage over the best one.
The efficiency constraint ensures that each additional time border and high resolution envelop is worth. The modified flow chart with efficiency threshold is illustrated in Figure 23. The proposed T/F grid decision take account of quality, bit overhead, and encoding bit rates at the same. The experiments in Chapter 6 will show the efficiency compared to other codecs.
Bit-Overhead Stage = 0 Best = 0 DSR improving >
efficiency threshold
Bit-Overhead Stage = 0 Best = 0 DSR improving >
efficiency threshold
Figure 23: DP flow chart with both quality and efficiency constraint.
5.4 Frame Class Decision
Four frame classes are used in SBR codec. Each frame class has different flexibility to describe the distribution of time borders. According to the position of time borders, leading frame border and trailing frame border, frame class of each frame can be determined. In addition to record the format of time borders, the objective of frame class for variable frame border is to spare bits for time borders. If both frame borders are fixed, it means that there are two “time borders” wasted. In Figure 24(a), there are two consecutive frames and respective time borders. If the frame borders are always constant, the latter frame needs an extra time border.
Comparing to Figure 24(b), due to the variable trailing frame border of the first frame, the first time border of the second frame can be removed.
T
F
0 16 32
( timeslots )
0 18 32
FIX
VAR (a)
(b)
T
F
0 16 32
( timeslots )
0 18 32
FIX
VAR (a)
(b)
Figure 24: An example for variable frame border.
Consequently, in order to determine the position of the trailing frame border of each frame, the information for time borders of the next frame is necessary. According to looking ahead of the next frame and the distribution of time borders in this frame, the most suitable frame class can be determined.
Chapter 6
Artifacts in SBR
6.1 Tone Trembling
The patching which determines the corresponding relation between replicated low frequency bands and original high frequency bands is different depending on different master frequency band tables, SBR start and stop boundaries. If one of these three factors changes, then patching changes, i.e. assume that the 8th subband is replicated for someone high frequency subband this frame, and the next frame, the replicated low frequency subband change into 10th subband. This phenomenon may cause the spectrum discontinued in time domain. In noise-like signal, the
The patching which determines the corresponding relation between replicated low frequency bands and original high frequency bands is different depending on different master frequency band tables, SBR start and stop boundaries. If one of these three factors changes, then patching changes, i.e. assume that the 8th subband is replicated for someone high frequency subband this frame, and the next frame, the replicated low frequency subband change into 10th subband. This phenomenon may cause the spectrum discontinued in time domain. In noise-like signal, the