Chapter 2 Backgrounds
2.3 Envelope Adjuster
2.3.2 Current Envelope Estimation
In order to adjust the envelope of the present SBR frame, the envelope of the current SBR signal needs to be estimated. There are two different estimation mode used in SBR codec, interpolation and non-interpolation. With interpolation, the estimated current envelopE is given by '
( ) ( )
The notations are illustrated below m : QMF subband index
l: Time envelope index
k : the first QMF subband in the SBR range. (SBR start boundary) x
t : contains the time borders for all SBR envelopes in the current SBR frame. E
M: The total amount of QMF subband in SBR range.
L : Number of SBR envelopes. E
If non-interpolation is used, the energies are averaged over every frequency band. The estimation is
( ) ( )
r : Envelope resolution
( ) ( )
r ln : Number of frequency band
( )
F : Frequency band table
The difference between interpolation mode and non-interpolation mode is illustrated in Figure 8 and Figure 9. In interpolation mode, the energies are averaged over every QMF filter band, and each QMF subband derives respective gain value. All the OMF subbands in one frequency band will be adjusted to the same energy. In non-interpolation mode, the energies are averaged over every frequency band. All the QMF subbands in one frequency band use the same gain value, and the envelope of replicated signal will be maintained.
F original high band signal
Replicated low band signal F
Average energy E Interpolation
F original high band signal
Replicated low band signal F
Average energy E Interpolation
Figure 8: Envelope estimation by interpolation mode and relating adjustment
F original high band signal
Replicated low band signal F
Average energy E Interpolation
F original high band signal
Replicated low band signal F
Average energy E Interpolation
Figure 9: Envelope estimation by non-interpolation mode and relating adjustment 2.3.3 Additional HF Components and Gain Calculation
The noise floor scalefactor is a ratio, and in order to add the correct amount of noise, it needs to be converted to a proper amplitude value, according to the following.
( )
m,l = E( ) ( )
m,l 1+QQm( )
m,l,l ,0≤m≤M −1,0≤l≤LE −1Q (3)
The level of sinusoids are derived as
( )
m l E( ) ( )
m l SQm( )
ml lS 1 ,
, ,
, = + (4)
And the gain value are derived by
( )
In order to avoid unwanted noise substitution, the gain values are limited according to the limiter gain mechanism:
( ) ( )
band. Hence, the gain values are limited according to
( )
m l(
G( )
m l G( )
m l)
Glim , =min , , Max , (7)
The additional noise component needs to be revised by
( ) ( ) ( )
Finally, the resulting gain values and additional components are
( ) ( ) ( )
HF Signal Assembling
Before the gain values are applied to the subband samples, there is a filter to smooth the gain values. The smooth filter is applied according to the parameters extracted from bitstream. Finally, the subband samples are adjusted by these gain values, and the additional noise and tone components are added.
Chapter 3
SBR Range Decision
SBR range decision here is to decide a boundary in frequency domain. This boundary separates QMF data into two portions. Frequency bands lower than the boundary is to be coded by AAC encoder, and frequency bands higher than the boundary is to be coded by SBR encoder. SBR replicates the subband signals encoded by AAC to reconstruct high frequency subband signals. Therefore, the quality of SBR encoder greatly relies on AAC encoder. In other words, the objective of SBR range decision is to ensure that the low frequency parts encoded by AAC can have a satisfactory quality. The range decision allots the burdens between SBR encoder and AAC encoder, which is a decided module for HE-AAC quality. With inappropriate range disposing, it may bring perceptible artifacts therefore reduce the audio quality.
While selected SBR start frequency is too high, available bits may not be enough for AAC to encode the assigned bandwidth. HE-AAC may produce “spectral valley”
around range boundary due to insufficient bits. Figure 10 illustrates an example of resultant spectrum with spectral valley. The quality of low frequency components also influences the high frequency components reconstructed by SBR. Figure 11 shows an example of distortion in high frequency due to bad quality of low frequency component. Oppositely, if bits are sufficient and the bandwidth assigned to AAC is too narrow, since the maximum bandwidth of SBR is restricted, the overall bandwidth of HE-AAC is getting lower and decreases the audio quality. Obviously, bit rates and sampling rates are the main factors to this range decision.
AAC SBR
AAC SBR
Figure 10: Spectral valley from inappropriate range decision.
AAC SBR
AAC SBR
Figure 11: Distortion for SBR range due to bad quality of AAC part.
Another significant factor to SBR range decision is the contents of signal. The required bits of AAC encoder are related to the audio content. Thus, whether or not the available bits are sufficient is related to signal contents. Even at high bit rates, the spectral valley still occurs in the trailing of AAC parts because the relating audio content is hard to be encoded. On the contrary, at the low bit rates, the bandwidth of AAC is not necessary to be cut off.
Therefore, SBR range decision is greatly involved with bit rates and audio content. The consideration of audio content is aggressive and active, and the consideration of bit rates is steady and conservative. According to the two factors, this thesis proposes two possible approaches, which are adaptive SBR range and error concealment.
3.1 Adaptive SBR Range Adjustment
The required bits for each frame in AAC encoder are different due to the contents of signal. Consequently, the most flexible method is adaptively adjusting SBR range according to condition of AAC encoder. Through detecting the zero bands in AAC bit allocation, SBR range can be determined adaptively frame by frame. The adaptive method not only eliminates the spectral valley artifact, but achieves good interconnection between AAC encoder and SBR encoder. Giving Figure 6 for an example, there are zero bands in the trailing of AAC parts. By detecting these zero bands, the SBR range can move ahead to avoid the spectral valley. Figure 12 shows the result.
AAC SBR
AAC SBR
Figure 12: The spectral valley revising due to adaptive range adjustment.
However, this method may face four shortcomings: the SBR header overhead, tone trembling artifact, reduced efficiency for the DPCM and the fluctuated bandwidth.
3.1.1 SBR Header Overhead
SBR encoder uses several different frequency band tables, which are master band frequency table, high resolution frequency band table, low resolution frequency band table, noise floor frequency band table and limiter frequency band table. The parameters in SBR bitstream header are needed to define all frequency band tables, SBR start boundary and stop boundary. If the bitstream parameters used for this frame are the same as the last one, then the bitstream header needs not to be transmitted again. On the contrary, a transmission of the header is only needed when the parameters differ from the last ones. Therefore, adaptively revising SBR range needs to consume bits for transmitting new bitstream header. The syntax of SBR header is illustrated in Figure 13. In SBR header, the two parameters bs_start_freq and bs_stop_freq define the SBR range. According to Figure 13, the overhead for varying bs_start_freq and bs_stop_freq is 16bits. Therefore, altering SBR range each time takes 16 bits. This header overhead may be serious when SBR range changes often.
SBR Bitstream Header
bs_amp_res1 bs_start_freq4 bs_stop_freq4 bs_xover_band3 bs_reserved2 bs_header_extra_11 bs_header_extra_21
bs_freq_scale2 bs_alter_scale1 bs_noise_bands2 bs_limiter_bands2 bs_limiter_gains2 bs_interpol_freq1 bs_smoothing_mode 1
SBR Bitstream Header
bs_amp_res1 bs_start_freq4 bs_stop_freq4 bs_xover_band3 bs_reserved2 bs_header_extra_11 bs_header_extra_21
bs_freq_scale2 bs_alter_scale1 bs_noise_bands2 bs_limiter_bands2 bs_limiter_gains2 bs_interpol_freq1 bs_smoothing_mode 1
Figure 13: The parameters include in SBR header bitstream.
3.1.2 Tone Trembling
Tone trembling is an artifact which greatly decreases the audio quality in the perceptual hearing. The characterization of this artifact will be described in Chapter 6.
3.1.3 Reduced Efficiency of DPCM
A single sinusoid in the frequency domain transforms to a stable signal in the time domain, and on the contrary, a pulse in the time domain corresponds to a constant in the frequency domain. Figure 14 describes the above property of audio signal. On a word, most signal is usually stable in either time or frequency domain.
Therefore, using delta coding in one of these two domains according to the signal characteristics can increase the coding efficiency. In Figure 15, since the SBR range of this frame differs from the last one, the number of subband included in SBR range is different between two consecutive frames. It disables time-direction DPCM for the first envelope of this frame. Consequently, the incomplete DPCM takes more bits and decreases the coding efficiency.
Figure 14: The example for the characteristic of attack signal,
AAC AAC
T F
DPCM
AAC AAC
T F
DPCM
Figure 15 Time-direction dpcm disability
3.1.4 Fluctuated Signal Bandwidth
The subband number of SBR range is restricted by standard in Figure 16.
According to different sampling rate, the different maximum ranges are defined. Since the SBR start boundary is adjusted adaptively, in order to observe this restriction, the stop boundary may need to be moved ahead. Consequently, the bandwidth of signal may vary with frames. In addition, the changing of SBR stop boundary may also cause tone trembling artifact. Regardless of tone trembling, the fluctuated bandwidth makes signal intermittent and reduce the audio quality in perceptual hearing.
2 0
48 , 32
35 , 44.1
32 , 48
SBR SBR SBR
Fs kHz
k k Fs kHz
Fs kHz
≤
− ≤ =
≥
Figure 16: The range constraint of SBR [3].
Summary
Integrating the above discussion, the adaptive SBR range method will face many difficulties. The most serious problem is the bit overhead. In order to record the change of range, more bits are used for transmitting bit stream. Hence, this method has the highest flexibility but loses the coding efficiency. However, SBR is used for low bit rate coding. Instead of high coding precision, SBR turns to economize required bits for acceptable reconstructed signal. Therefore, at low bit rates, the adaptive range approach seems unsuitable.
3.2 Error Concealment
This method determines SBR range depending on bit rates and sampling rates.
The SBR range is fixed among frames. According to bit rates, the burden allocation between AAC encoder and SBR encoder is fine in most frames. The occasional spectral distortion is handled by error concealment mechanism [14][15][16][17] in SBR decoder. This approach leads high coding efficiency, and error concealment mechanism can compensate the lack of flexibility. Further, due to the help or error concealment, the bandwidth of reconstructed signal can be aggressively extended.
Chapter 4
Related Work for Time/Frequency Grid
This chapter introduces the design of T/F grid in 3GPP HE-AAC encoder [15].
The block diagram of 3GPP HE-AAC encoder is illustrated in Figure 17. T/F grid decision contains transient detector, frame splitter, and T/F grid generator. Transient detector detects the start position of transient. Frame splitter is only operated in the frame without transient, and it determines this frame separated into two envelopes or not. T/F grid generator receives the information from transient detector and determines the time borders and the related envelope resolution.
64 ch Analysis QMF
Transient detector
Frame splitter
PCM signal
(L/R/M) Tonality
detector
T / F Grid Generator
Additional Control Parameters
Envelope Energy Formatter
Quantiser and T/F Huffman
Encoder
Bitstream Multiplexer
Coded SBR Bitstream
Figure 17: Block diagram of 3GPP HE-AAC encoder [15].
4.1 Time/Frequency Grid Design
In 3GPP SBR encoder, the frequency band table is selected depending on bit rate and sampling rate and it does not alter among frames. Time segments, envelops
resolution and frame classes are determined according to below mechanism.
4.1.1 Transient Detector
Transient detector is the most important module in 3GPP T/F grid. The following frame splitter and T/F grid generator operates according to the information from it. On a word, the SBR coding quality relies on this module.
The objective of transient detector is to determine whether a transient occurs in the present frame and find the position for the on-set of the transient. The output variables of transient detector, tranFlag and tranPos are used for recording the above information.
Transient detector operates on subband samples of one frame length and starts from sample 8. The basic principle of transient detector is to estimate the energy difference among samples, and determines whether a transient exists depending on information of energy difference. At first, calculate the average energy of each subband in the processed frame and then derive the standard deviations. Next, for each subband, calculate the neighbor energy difference of each sample and compare it to the relating standard deviations. If the energy difference is larger then the relating standard deviations, then take down the value which exceeds the standard deviation. For each time samples, the estimation procedure is executed 64 times, and the values indicating
“large energy difference” are added. Finally, check each sample whether one with the energy difference value exceeds the threshold. If it does, then set tranFlag to true, and record the position of this sample. The diagram of transient detector is illustrated in Figure 18.
4.1.2 Frame Splitter
Frame Splitter only operates when transient detector has detected the absence of a transient in the current frame. It decides whether the current frame is split into two envelopes of equal size and uses a variable splitFlag to store the result. The concept of frame splitter similar to transient detector is to estimate the energy difference, but not as precise as transient detector. The estimation unit used in this module is half of frame length. Compare the energies of the two half frame, the variable splitFlag can be determined.
4.1.3 T/F Grid Generator
The T/F grid generator creates the time/frequency grid for one SBR frame. Input parameters are provided by transient detector and frame splitter. Frame class is determined at first. It is accomplished depending on the trailing frame border of last frame (FIX or VAR) and the parameter tranFlag of the present frame. On a word, if there is a transient in the current frame, the trailing frame border is VAR; else, the trailing frame border is FIX. The total combination of the leading frame border and transient is described in Table 1. When most transients are sparse, the FIXVAR-VARFIX pair is used. The current frame is encoded with the FIXVAR portion, and the VARFIX gird is stored for the next frame. If no transient occurs in the next frame, the stored VARFIX grid is used; else, the new calculations are needed for the new transient, and merged with the already calculated grid, whereby, a VARVAR class frame is used.
VARVAR
Table 1: The combination of transient and trailing frame borders.
The positions of time borders in one frame are determined mainly on the position of transient, i.e. the input parameter tranPos provided by transient detector. Each
transient accompanies three main time borders, the first one locates at the position for on-set of the transient, the second one is two timeslots behind the first border, and the third one is four timeslots after the second one. In addition, if the position of transient is not too near the front of the current frame, there will be additional time border ahead the transient. Similarly, additional time borders may be adopted behind the transient when the transient is in the front of the present frame. For Figure 19 as example, the tranPos is 13, and the resulting main time borders locate at 13, 15, and 19. The additional time borders are at 7 and 25 respectively. Consequently, the time borders in the present FIXVAR frame locate at 7, 13, 15, and trailing frame border is at 19. The grid contents of VARFIX portion will be stored for the next frame.
tranFlag = 1 tranPos = 13
|<- Frame n: FIXVAR-:--3->|< Frame n+1: -->|
... QMF slots I-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-Io|o|o|-|-|-|-|-|-|-|-|-|-|-|-|-I SBR slots 0 7 13 15 19 25 32 FG index
I: nominal frame boundaries o: frame overlap region slots
VARFIX
Figure 19: An example for T/F grid generator [15].
4.2 Summary
On a conclusion, the key in the 3GPP T/F grid is the transient detector. Time borders are greatly involved with the position for the on-set of the transient. If there is a transient detected in the present frame, regardless of additional time borders, it creates three time borders at least. The number of time borders can be considered as the required bits. Therefore, the objective of T/F grid is to use the least number of time borders to achieve satisfactory audio quality. In 3GPP T/F grid, some time borders can be removed to economize the consumed bit for the coding efficiency.
However, the most existing SBR encoder adopts the concept of transient detector, including Coding Technology [18] and US 2006/0031065 A1 [19], for T/F grid.
Instead of detecting of position of transient, this thesis analyzes the design issues of
Chapter 5
Efficient Design of Time/Frequency Grid
Time/Frequency Grid decides the format of reconstruction unit in both time and frequency domain. The resolution of reconstruction unit determines the accuracy degree of reconstructed signal and required bits. It is obvious that for stable signal, the format of T/F grid should be “simple” to reduce the required bits. Oppositely, in order to reduce the distortion of reconstructed signal, the format of T/F gird should be dedicated. In addition, if there are more available bits, the resolution of T/F grid can be higher. Consequently, T/F grid decision is greatly involved with bit rate and audio contents. The efficient design of T/F Grid which this thesis proposes emphasize on these two main factors.
The first issue is how to judge a T/F grid assignment is good or poor. This paper introduces a method to measure the reconstruction error of signal objectively. With the analysis of error, collocating different bit rates, the most suitable form of T/F grid can be selected.
T/F grid comprises frequency table decision, time borders distribution, envelope resolution decision and frame class decision. Frequency tables and envelope resolution codetermine the frequency resolution of T/F grid, and time borders distribution and frame class are responsible for time resolution. The reconstruction error measurement is described first, and the designs of other sub-modules are followed.
5.1 Analysis of Reconstructed Error
The process of SBR decoder has been described in Chapter 2. Through simulating the concept of reconstruction in decoder, the corresponding reconstructed error can be estimated. The reconstructed error used in this thesis is defined as error of power spectrum. The estimation process is accomplished as follows:
First, the rescaling gain values can be hypothesized to the power spectrum of original high frequency contents divided by corresponding low frequency ones. It is given by
∈
Where Hi,g represents the power spectrum of high frequency band samples in one
grid unit, and Li,g stands for the corresponding power spectrum of low frequency using G as notation. Consequently, the reconstructed envelope error of T/F grid is estimated by
And (13) is expressed as follows.
( )
[ ( ) ]
=Furthermore, it can be expressed as follows
⋅
It is clear to see that the error is related to the energy of original high frequency contents and the correlation between high frequency bands and replicated low
It is clear to see that the error is related to the energy of original high frequency contents and the correlation between high frequency bands and replicated low