Chapter 3 Fast Rate-Distortion Optimized Transcoder
3.2 NMR Optimized Transcoding Algorithm
3.2.2 NMR-Based Rate-Distortion Optimization
The NMR-based rate-distortion optimization is based on NMR-based search for the best scalefactor increment. To search for scalefactor increments under the NMR criterion, the transcoder must derive the masking thresholds based on the psychoacoustic model that is built with the uncompressed audio signals. With the compressed audio signals from the archived bitstreams, the masking thresholds can not be used for perceptual audio coding. In addition, derivation of the masking thresholds will take lots of computation cycles, which does not match the real-time transcoding requirement. Thus, we speed up the NMR optimized transcoding based on the embedded information within the input bitstreams.
Transcoder Output bitstream (NMRo)
Input bitstream (NMRi)
Figure 3-8. A sketch map of transcoder
Figure 3-8 illustrates a transcoder with an input bitstream and an output bitstream. The noise-to-masking ratios (NMR) of input and output bitstreams are denoted as NMRi and NMRo
respectively. NMRi is the upper bound of NMRo, since conversion of the input bitstream to the output bitstream at a lower bitrate, the audio quality is degraded. The NMR degradation by the transcoding process can be formulated by
( ) ( )
where SMRo and SMRi present the signal-to-masking ratios (SMR) of the audio signals at the output and input bitstreams respectively. SNRo and SNRi denote the signal-to-noise ratios (SNR) of the audio signals at the output and input bitstreams respectively.
The NMR value is to quantify the energy of audible noise. For the same audio source, the reconstructed audio signals with a smaller NMR value have better audio quality than the reconstructed audio signals with a larger NMR value. Therefore, Eq. (19) defines the NMR degradation by subtracting NMRo with NMRi. By definition, NMR can be represented by the SMR minus SNR. Thus, the NMR degradation equals to the difference of SNR values plus the difference of SMR values. The SNR value is measured with the audio source waveforms and the reconstructed audio waveforms. The SMR value is derived by the psychoacoustic model with the audio source signals. Thus, for the same audio source and the same psychoacoustic model, the SMR difference (ΔSMR ) is set as zero and the NMR degradation can be formulated as a function of the SNR difference for bitrate adaptation transcoding. As the SNRo approaches to SNRi, the audible quality of reconstructed signals from the input and output bitstreams is close. Thus, the minimization of NMR degradation can be replaced with the minimization of SNR degradation at a given bitrate in bitrate conversion.
After deducing the optimization criterion from NMR to SNR, the schema of NMR optimized algorithm for each audio frame.
1. Given the difference of original and target bitrates, the number of bits to reduce at a scalefactor band of the handling frame is estimated. The bits allocated to each scalefactor band shall be large enough to make the averaged value of quantized coefficients greater than 1.
2. The scalefactor increment is estimated based on the difference of the bitrate of the original bitstream and the target bitrate of transcoded bitstream.
3. A specified search range to fine tune the scalefactor increment is defined.
4. To optimize the SNR value at the scalefactor band, an optimal increment of scalefactor is obtained by a full search within the predefined search range.
5. The best scalefactor for coding the current scalefactor bands equals to the summation of the scalefactor increment and the original scalefactor.
6. With the final scalefactor, the reconstructed coefficients are re-quantized and encoded into the transcoded bitstream.
7. The steps 1 to 6 are applied from high to low frequency scalefactor bands of the current frame.
For an AAC coding system, SNR of the output bitstreams can be derived from the
quantization formula. To further save the coding time, the quantized output coefficients are re-quantized without inverse quantization. Referring to Eq. (3) and Eq. (4), the quantization formula is defined as
where x is the compressed MDCT coefficient. qi and sfi represent the i-th quantized coefficient and the i-th scalefactor of input bitstream respectively. The operator ‘int’ means to cast the remainder. mdct_line is the MDCT coefficient.
When qi is inversely quantized, a reconstructed value of the MDCT coefficient xRi is obtained by
When the bitrate of the input bitstream is reduced, the quantized coefficient is decreased from qi to qo and the scalefactor quantity is increased from sfi to sfo. The scalefactor increment is denoted as sfd. In Eq. (22), qo is obtained by re-quantizing xRi that is the reconstructed MDCT coefficients. Eq. (22) shows that re-quantization can simply be done by applying a scalefactor increment to the original quantized coefficients without the inverse quantization.
( )
With the scalefactor increment, the SNR values for audio signals at the transcoded bitstream are analyzed by Eq.(23). The suffixes Ri and Ro mean the reconstructed signals of input and output bitstreams respectively.
2
By substituting the variables at Eq.(20) into Eq.(22), Eq. (23) shows that the SNR value of the output signals can be derived by a function of the quantized input coefficients qi and the scalefactor increment sfd. Given an input coefficient qi, we can make an observation on the correlation between SNR and sfd according to Eq.(23). Figure 3-9 demonstrates that in most of the cases, the SNR values may decrease as the magnitudes of sfd increases.
0
Figure 3-9. SNR value with increasing sfd and a constant qi