Chapter 2 MPEG-2/4 Advanced Audio Coding and Transcoding Techniques
2.2 Overview of Audio Bitrate Transcoders
2.2.3 Single Layer AAC Transcoder (SLAT)
In [18], the fast single layer AAC transcoder manipulated the transcoded bitstreams in the “quantized spectrum domain”. The bitstream manipulation is based on based on the linear relationship of the required bits and the percentage of nonzero quantized spectral coefficients.
By modeling the linear relationship and reusing the information within the original bitstreams, SLAT can speed up the cascaded transcoder by replacing the nested loop at the bit reservoir control with a linear prediction. Thus, SLAT can save the time taken by the forward and inverse filter bank transformation, the psychoacoustic model and the R-D control process, which are the most computational burden modules of an AAC coding system. In addition, SLAT can retain the coding performance close to the cascaded method.
Bitstream De-multiplexer
Noiseless Decoding
Noiseless Coding
Bitstream Formatter
AAC Bitstream Transcoded AAC Bitstream
Bandwidth Limiter w/ Lower Bound
Quant. & Scalefactor Modifier w/ Scalefactor grouping Quantized coef. &
Scalefactors
Figure 2-14. Block diagram of SLAT
Figure 2-14 shows the architecture of SLAT that manipulates the transcoded bitstreams
in the “quantized spectrum domain” in order to maximize the overall transcoding throughput.
After the AAC input bitstream is decoded by the Noiseless Decoding module, the new bitstream is generated with the quantized spectral coefficients. A set of quantized spectral coefficients and scalefactors (sf_decoder in Eq.(2)) are modified by three bitrate reduction techniques, which mean the bandwidth reduction with the ρ-domain model, bit reduction for encoding the lower frequency coefficients and side information reduction. Figure 2-15 shows the combination of three reduction methods in the SLAT architecture.
N Bandwidth Limiter
(ρ domain model) w/ Lower Bound
Quant. & Scalefactor Modifier
w/ Side Information Reduction R_target ≧ R_transcoded
Done Y
Quant.
Scalefactor
Quant.new Scalefactornew Figure 2-15. Algorithm flow chart of SLAT
A. Bandwidth reduction with the ρ-domain model
The approximated linear model between the coding bitrate and percentage (ρ) of non-zero quantized coefficients is used to limit the bandwidth for lower bitrate bitstreams.
0 500 1000 1500 2000
0 100 200 300 400 500
Number of non-zero coefficients
Coding bits per frame
Figure 2-16. Linear model between the coding bitrate and the percentage (ρ) of non-zero quantized coefficients.
Figure 2-16 demonstrated the ρ-domain model in an AAC coding frame. In Eq. (5), the approximated linear relationship is used to estimate the total number of non-zero coefficients to be discarded from the higher frequency bands.
t t o
o
N R N
R ≅ , (5)
where Ro is the original bitrate, Rt is the target bitrate, No is the non-zero coefficients of original bitstream and Nt is the predicted non-zero coefficients for the target bitrate.
The difference between No and Nt represents the total number of non-zero coefficients to zero out. The operation to set the coefficients to zero is applied from the highest frequency band. The removal of the high frequency bands may decrease the listening quality. To retain the quality, the lowest bounds to remove the high frequency bands at different target bitrates should be set.
B. Bit reduction for encoding the lower frequency coefficients
To save the bits to encode the lower frequency coefficients, the R-D relationship in quantized spectrum domain is analyzed. We proved that both the bitrate and distortion of transcoded bitstream can be formulated as a function of “quantized coefficient of original bitstream” and “increase of sf_decoder”. In addition, the observations on the rate-distortion
curve showed that the maximum distortion takes place when the quantized coefficients are set from the unity to zero. The bit reduction changes the magnitude of the quantized coefficients by increasing sf_decoder with the average quantized value. To retain the listening quality, the nonzero quantization coefficients of the original bitstream shall have the values larger than the unity.
In the beginning of bit reduction, the quantized coefficients are averaged in a scalefactor band. In Eq.(6), qi represents the i-th quantized coefficient. qavg,b and sf_lengthb denote the averaged value and the length of the b-th scalefactor band respectively.
b
The scalefactor difference sfdone,b is calculated and the average quantized value is diminished to the unity by
b
Given sfdone,b, the quantized value qnew,i is calculateby
⎟⎟⎠
Based on the re-quantized values, the bits Bitnew,b needed for the scalefactor band b is calculated by Huffman coding. Bdone,b is the difference between the original bits Bitori,b and the current bits Bitnew,b.
A ratiob is calculated in Eq.(10), which represents the number of bits can be reduced by increasing the scalefactor by one step with an averaged quantized value that is low bounded to
the unity.
one b one
b sfd
ratio = Bd . (10)
The increase of the estimated scalefactors for the entire frame sfdframe is calculated in Eq.
(11). Bdframe is the number of bits to reduce for the current frame.
. =
∑
frame b frame frame
ratio
sfd Bd . (11)
Thus, the scalefactors can be updated by adding sfdframe to the original scalefactor. The quantized coefficients are updated by replacing sfdone,b with sfdframe in Eq. (8).
C. Side information reduction
The side information in AAC occupies a high percentage of coding bits at low bitrates.
To reduce the bits of side information, SLAT decreased the difference of successive scalefactors and set zero codebook for the zero quantized coefficients. The experiment results showed that noise-to-masking ratio (NMR) degradation by SLAT is less than 1.0 dB compared to the cascaded transcoder. SLAT can speed up the cascaded transcoder by 5 times.