Organization - 知覺式音訊編碼壓縮瑕疵之探討

CHAPTER 1 INTRODUCTION

1.3. Organization

This dissertation is organized as follows. Chapter 2 considers the two zero-quantization artifacts, “band-limited” and “birdie” artifacts, and develops a fast radix-q algorithm of DCT-IV for handling the tradeoff of parallelism and numerical distortion. Chapter 3 derives the theoretic fundamental of TNS, considers the artifacts of TNS, and proposes the related artifacts reduction method. In Chapter 4 and Chapter 5, the artifacts of SBR and PS are concerned, respectively. Chapter 6 concludes this dissertation.

CHAPTER 2 COMMON ARTIFACTS BY ZERO-QUANTIZATION AND NUMERICAL DISTORTION

In this chapter, we first concern two common zero-quantization artifacts which lead to the loss of high or middle frequency. On the other hand, to achieve the best system efficiency, the sequence lengths other than a power of two have been used in many audio applications. We develop a fast odd radix algorithm for computing DCT-IV of composite lengths with low numerical distortion artifacts and high parallelism.

2.1. Band-Limited and Birdie Artifacts

The two most common compression artifacts from audio coding are the “band-limited”

and the “birdie” artifacts [18], [35]. The bit-rate constraint inflicts the artifacts on critical audio segments showing up in spectrum as the “spectral valley” or the “spectral clipping”. Spectral valley, as shown in Figure 2.1 (b), means a band in which all frequency lines are zero-quantized.

Spectral valley phenomenon is mainly due to unsuitable bit-allocation policies or excessive masking energy measured from the psychoacoustic model in audio encoders. Spectral valleys may appear and disappear successively due to unsteady demand for bits between frames. This causes the changes in timbre and the energy variation in spectrum and results in the birdie effect to which the human hearing is very sensitive.

Spectral clipping, as shown in Figure 2.2 (b), results from cutting the high frequency (HF) content during audio compression. The loss of HF may lead to “muffled” audio. Because of the limited bit rate, most audio coding approaches save the bits required for HF spectra and put all available bits to low frequency (LF) spectra, which are more relevant for the human hearing.

(a)

(b)

(c)

Figure 2.1. Spectral valley phenomenon and its concealment: (a) original audio signal spectrum;

(b) compressed spectrum with two zero bands in low and middle frequency parts; (c) compressed spectrum enhanced by ZBD.

For instance, the bandwidth in MP3 is generally restricted to 16 kHz due to the protocol constraint, and the speech signal can even be limited to 7–8 kHz with good clarity. However, the HF loss significantly degrades the signal with rich HF components. Handling the two artifacts is a tradeoff for the encoder design owing to the limited available bits. A coding method that aggressively retains HF contents brings more risk to spectral valleys to which the human hearing is more sensitive among the two artifacts. Therefore, the HF content is generally cut to reduce the risk of spectral valleys.

Many attempts [36]-[44] have been made to reduce the two artifacts. For instance, our work [36]-[38] presented an audio patch method comprising two schemes, Zero Band Dithering (ZBD) and High Frequency Reconstruction (HFR), to handle the artifacts in decoder.

Figure 2.1 (c) and Figure 2.2 (c) illustrate the enhanced spectra resulting from the patch method.

The method can be included in frequency-domain decoders, such as MP3, AAC and HE-AAC, to conceal the artifacts without prior information. The ZBD module can be applied to frequency lines after dequantization and dithers zero lines with random noises. On the other hand, the HFR module can be applied to the transform coefficients before the inverse transform or the QMF subbands before the synthesis filterbank and regenerate the clipped HF spectrum by linear extrapolation. For instance, Figure 2.3 and Figure 2.4 illustrate the incorporation of the two models into AAC and HE-AAC decoders, respectively. Figure 2.5 illustrates the application of the audio patch method to a HE-AAC audio. As can be seen, the ZBD method patches the LF part of the HE-AAC audio spectrum is dupliated to middle HF part; moreover, the HFR method extends the bandwidth.

(a)

(b)

(c)

Figure 2.2. Spectral clipping phenomenon and its concealment: (a) original audio signal spectrum; (b) compressed spectrum with narrow bandwidth; (c) compressed spectrum with bandwidth extension by HFR.

Noiseless Decoding Noiseless Decoding

Inverse Quantizer Inverse Quantizer

Scale Factor Scale Factor

ZBDZBD

M/SM/S

PNSPNS

Prediction Prediction

Intensity/Coupling Intensity/Coupling

Long Term Prediction Long Term Prediction

TNSTNS

Filter Bank Filter Bank

Gain Control Gain Control

HFRHFR Bitstream

Demultiplex BitstreamAAC

Data Control

PCM Data Noiseless Decoding Noiseless Decoding

Inverse Quantizer Inverse Quantizer

Scale Factor Scale Factor

ZBDZBD

M/SM/S

PNSPNS

Prediction Prediction

Intensity/Coupling Intensity/Coupling

Long Term Prediction Long Term Prediction

TNSTNS

Filter Bank Filter Bank

Gain Control Gain Control

HFRHFR Bitstream

Demultiplex BitstreamAAC

Data Control

PCM Data Figure 2.3. The incorporation of ZBD and HFR into AAC decoder.

Spectral Line Decoding

ZBD

Other Processing Filter Bank

Analysis

QMF Bank SBR Processing

HFR

Synthesis QMF Bank

Bitstream Parser/Demultiplexer

PCM Data AAC Decoder

AAC Decoder

SBR Decoder SBR Decoder HE-AAC Bitstream

Spectral Line Decoding

ZBD

Other Processing Filter Bank

Analysis

QMF Bank SBR Processing

HFR

Synthesis QMF Bank

Bitstream Parser/Demultiplexer

PCM Data AAC Decoder

AAC Decoder

SBR Decoder SBR Decoder HE-AAC Bitstream

Figure 2.4. The incorporation of ZBD and HFR into HE-AAC decoder.

(a)

(b)

Figure 2.5. Enhancement of HE-AAC audio by the audio patch method: (a) the HE-AAC audio spectrum; (b) the enhanced spectrum.

2.2. Fast Radix-q Algorithm for DCT-IV with Low Numerical Distortion Artifact and High Parallelism

The DCT-IV as defined in (1) is the fundamental module in the efficient computation of the lapped orthogonal transforms and cosine modulated filter banks known as MDCT.

−

The sequence length of a power of two is most popular due to the computational efficiency and structure simplicity of the existing radix-2 algorithms. However, various sequence lengths other than a power of two have been used to achieve the best efficiency in audio coding and processing, such as the 12/36-point MDCT in MP3 audio coding.

In the literature, there exist various fast radix-2 algorithms for computing DCT-II and DCT-III [45], [46]. The fast radix-q algorithms for the DCT-II/DCT-III computation have been also developed and extended to the fast mixed-radix algorithms for composite lengths [47], [48]. On the computation of DCT-IV, we can consider the four existing approaches, which convert DCT-IV into DCT-II or DCT-III [26]. The four fast algorithms are represented in matrix form as

where the DCT-II/ DCT-III/ DCT-IV matrices are respectively defined as ]

triangular matrix L is defined by the serial computation: [y ,₀ y ,…,₁ y_N₋₁]^T= L[x ,₀ x ,…,₁ x_N₋₁]^T

= [x /2, ₀ x −₁ y ,₀ x −₂ y ,..., ₁ x_N₋₁ − y_N₋₂]^T. However, as depicted in Figure 2.6, the four methods indicated in (2) involve either serial computations or reciprocal cosine coefficients which result in large numerical distortion artifact due to large dynamic ranges. In other words, these DCT-II/DCT-III-based fast algorithms have a tradeoff between numerical distortion and parallelism. In this section, we propose a fast radix-q algorithm for the DCT-IV computation with merits in parallelism, numerical stability and computational complexity, where q is an odd positive integer. The proposed radix-q algorithm can be extended to the fast mixed-radix DCT-II/DCT-IV computation for composite lengths.

2cos /4N

Figure 2.6. Signal flow graphs of the four DCT-IV algorithms indicated in (2).

2.2.1. Fast Radix-q Algorithm for DCT-IV Computation

We begin with the scaled DCT-IV (SDCT-IV) defined as

− superpositions by grouping the terms with the same indices under the module q as

Combining the second and the third terms of (4) and using the trigonometric identity cos(a+b)

= cos(a)cos(b) sin(a)sin(b), we obtain

( )

Equation (5) consists of q length-N/q SDCTs-IV defined by (7)-(10). Further, it can be shown

that for any integer p,

In order to save multiplications, by using properties (11)-(13), we form the two sequences U_k^p and V_k^p that are 1/2 of the sum and difference of _k

Similar to the strategy utilized in [47], for each k and each p, (q − 1)/2 multiplications can be saved by moving the cosine coefficients outside the brackets in (14) and (15), respectively.

However, the range of the angles Θ_m,^N_k is from 0 to /2, and thus the dynamic range of tangent values is large. To control numerical stability, (14) and (15) are rewritten as

−

Length-9

Figure 2.7. Signal flow graphs of length-27 SDCT-IV.

In (18), the dynamic range of the tangent and cotangent values is controlled within the interval [0, 1]. The final SDCT-IV outputs are obtained from

0 length-q SDCT-IV, the decomposition must be repeated until the lengths of subsequences are

operations. Absorbing the scaling factors into (16) and (17) yields (7)-(10) and (16)-(22). Figure 2.7 shows the signal flow graph for a length-27 SDCT-IV after the first stage decomposition.

2.2.1.1 Parallelism and Numerical Stability

Each DCT-II-based algorithm for DCT-IV computation illustrated in (2) involves either serial computations or reciprocal cosine coefficients. However, the proposed radix-q algorithm avoids reciprocal cosine coefficients, especially due to the mechanism in (18), and thus has good numerical stability. On the other hand, if the latency of hardware implementation is considered, the length of the critical path of the DCT-II-based algorithm involving the serial computation is N because of the recursive computation for matrix L. The unit of the length is one multiplication or addition operation. For the proposed radix-q algorithm, the length of the critical path is ceiling{log2[(q − 3)/2]} because of the summation in (21). This result shows the critical path of the proposed radix-q algorithm is significantly shorter than that of the DCT-II-based algorithm involving the serial computation.

2.2.1.2 Computational Complexity

The recursive forms of the cost functions for the proposed algorithm are shown in Appendix A. Let N = q , the non-recursive forms are given by

MIV(N) = (q− 1)(q + 2)/(2q) NlogqN + N/q, (23) AIV(N) = (q− 1)(q + 5)/(2q) NlogqN . (24) In general, a lower computational cost than that induced from (23) and (24) can be achieved by rearranging the operation factors. Furthermore, the optimization of the initial case for small length-q SDCT-IV can reduce the overall complexity. In Appendix A.2-A.4, we derive and tune the fast algorithms for radix-3, radix-5, and radix-9 DCT-IV computation.

The arithmetic complexity of the DCT-II-based algorithm indicated in (2) is given by

MIV(N) = MII(N) + N , (25)

AIV(N) = AII(N) + N – 1. (26)

Table 2.1 compares the arithmetic complexity of the proposed DCT-IV algorithm and the DCT-II-based algorithm, where the fast algorithm [47] is adopted for computing DCT-II of length N = q . The comparison shows that the proposed algorithm not only is free from the serial computation and numerical instability but also achieves a lower arithmetic complexity than the DCT-II-based algorithm for q = 3 and 9.

Table 2.1. Arithmetic Complexity Comparison for DCT-IV of N = q The proposed algorithm DCT-II based algorithm

q MIV(N), N > q MIV (q) MIV(N), N > q MIV (q) 3 4/3 N log3 N − 7N/6 + 5/2 3 4/3 N log3 N − 17N/18 + 3/2 4 5 11/5 N log5 N 11 11/5 N log5 N − 7N/10 + 3/2 9 7 27/7 N log7 N + N/7 28 27/7 N log7 N − N/2 + 3/2 25 9 20/9 N log9 N − 177N/216 + 11/8 16 23/9 N log9 N − 7N/8 + 15/8 17 q AIV(N), N > q AIV (q) AIV(N), N > q AIV (q) 3 8/3 N log3 N − N + 1 6 8/3 N log3 N − 7N/9 + 1 6

5 21/5 N log5 N 21 21/5 N log5 N − N + 1 17

7 36/7 N log7 N 36 36/7 N log7 N − N + 1 30

9 53/9 N log9 N − 103N/72 - 1/8 40 50/9 N log9 N − N + 1 42

2.2.2. Fast Mixed-Radix DCT-II /DCT-IV Algorithm

For composite lengths, i.e., N = 2^λ⁰⋅q₁^λ¹⋅q^λ₂²⋅...⋅q_n^λⁿ, for odd integers 0 < q1 < q2 <…< qn

and any non-negative integers λ₀,λ₁,…,λ_n, the proposed radix-q algorithm can be flexibly combined with the existing fast DCT-II/DCT-IV algorithms for composite lengths. The illustrated radix-2 DCT-II/DCT-IV algorithm consisting of Wang’s [49, eq. (50)] and Britanak’s [50, eq. (16)] algorithms is described in Appendix A.5. As depicted in Figure 2.8 and Figure 2.9, the radix-2 DCT-II/DCT-IV algorithm decomposes a length-N DCT-II into a length-N/2 DCT-II and a length-N/2 DCT-IV and decomposes a DCT-IV into two length-N/2 DCTs-II without involving serial computations and reciprocal cosine coefficients. The proposed radix-q DCT-IV algorithm can be combined with the radix-2 DCT-II/DCT-IV algorithm and the radix-q DCT-II algorithm [47] to constitute a mixed-radix algorithm for DCT-II/DCT-IV computation to achieve the demands in parallelism and numerical stability. Furthermore, as shown in Appendix A.6, the mixed-radix algorithm obtains the merit in computational complexity.

x₀

Figure 2.8. Signal flow graph of the length-N DCT-II decomposition.

x₀

Figure 2.9. Signal flow graph of the length-N DCT-IV decomposition.

2.3. Concluding Remarks

In this chapter, we have considered the two common zero-quantization artifacts,

“band-limited” and “birdie” artifacts. An audio patch method comprising two schemes, ZBD and HFR, has been proposed to reduce the two artifacts. The patch method can be incorporated into transform or subband based audio decoders, such as MP3, AAC and HE-AAC. On the other hand, for the computation of the cosine modulated filterbank, we have proposed a fast radix-q DCT-IV algorithm to handle the conflict between parallelism and numerical distortion artifact in the existing algorithms. The radix-q algorithm can be extended into a mixed-radix algorithm for the DCT-IV computation of composition lengths with the merits in parallelism, numerical stability and computational complexity.

CHAPTER 3 ARTIFACTS IN

TEMPORAL NOISE SHAPING

The TNS method [27]-[30] has been utilized in MPEG-2/4 AAC for attenuating the quantization noise preceding the attack signal known as the pre-echo artifact [18], [19]. As illustrated in Figure 3.1, the quantization noise spreads throughout the entire signal block in the time domain. The TNS module can shape and control the spread of quantization noise to improve audio quality.

Since the TNS in AAC is applied to the MDCT coefficients that are highly related to the even DCT-IV, based on the theory of the spectral AR modeling in the DTT domain, we establish the compact form of the TNS in the DTT domain and explain the “time-domain aliasing noise” [30], which has an unusual noise around the attack segment. We also concern the degradation of the artifact with the TNS filter orders. Finally, we compare the TNS by the Hilbert and power envelope methods.

Figure 3.1. Pre-echo artifact (dashed line: original waveform; solid line: quantization noise).

3.1.TNS Formulation in DTT Domain

TNS aims to shape the temporal envelope of the quantization noise by incorporating an open-loop predictive coding [31] across frequency lines in audio encoders/decoders. In terms of z-transform, the concept of TNS can be explained as follows. As depicted in Figure 3.2, x(k) and d(k) denote the input and the predictive residual signals in the frequency domain in the analysis part, whereas xr(k) and dr(k) denote the reconstructed signals related to x(k) and d(k) in the synthesis part. The relation between the reconstruction error r(k), i.e., x(k) − xr(k), and the quantization noise q(k), i.e., d(k) − dr(k), is expressed in z-transform as

) ( 1

) ) (

( H z

z z Q

R = − , (27)

where R(z) and Q(z) are the z-transforms of r(k) and q(k). If the magnitude response of the inverse or whitening filter 1/(1−H(z)) can approximate the temporal envelope of the frequency-domain input signal x(k), the quantization noise Q(e⁻^j^ω) (in the time domain) can be amplified or attenuated with the temporal shape. Figure 3.3 illustrates the shaping effect of the TNS applied in the MDCT domain.

In [27]-[30], Herre and Johnston have proposed the TNS predictive filter by exploiting the duality between the squared temporal Hilbert envelope and the power spectrum for continuous-time signals. Since, in the literature, there is no derivation for the finite discrete sequences in the DTT domain, this section derives the compact form for the TNS in the DTT domain through the theory of the AR modeling in the DTT domain.

Quantization &Dequantization Quantization &Dequantization BackwardT-F Mapping BackwardT-F Mapping ForwardT-F Mapping ForwardT-F Mapping Linear

Prediction

Input Signal Reconstructed

Signal

Quantization &Dequantization Quantization &Dequantization BackwardT-F Mapping BackwardT-F Mapping

ForwardT-F Mapping ForwardT-F Mapping Linear Prediction

Input Signal Reconstructed

Signal

Quantization &Dequantization Quantization &Dequantization BackwardT-F Mapping BackwardT-F Mapping

ForwardT-F Mapping ForwardT-F Mapping Linear Prediction

Input Signal Reconstructed

Signal

Figure 3.2. Open-loop predictive coding scheme in TNS

(a)

(b) (c)

Figure 3.3. TNS effect. (a) original signal in the time domain; (b) decoded signal without TNS;

3.1.1.Autoregressive Modeling in DTT Domain

The AR modeling [53], [64], also known as linear prediction (LP), has received more and more applications in audio coding. The theoretical fundamental for AR modeling of temporal/spectral envelopes with various DTTs has been established in Appendix B. Here, we summarize the critical results related to the TNS formulation.

Through this chapter, we consider all transforms as matrices that left-multiply the input sequence represented as a column vector.

3.1.1.1 Generalized Discrete Fourier Transform

The N × N generalized DFT (GDFT) [77] matrix is defined by +

= −

b n a k j

n k a,b

) )(

( exp 2

]

[_G _, π , for k, n= 0, 1, …, N − 1. (28)

Four special forms of the GDFT arise when a and b take on the values 0 or 1/2. They are classified and named as follows [76]:

(i) DFT (Discrete Fourier transform): a = 0 and b = 0.

(ii) OTDFT (Odd-Time DFT): a = 0 and b = 1/2.

(iii) OFDFT (Odd-Frequency DFT): a = 1/2 and b = 0.

(iv) O²DFT (Odd-Time Odd-Frequency DFT): a = 1/2 and b = 1/2.

The last three transforms can be regarded as the modified versions of the DFT with a 1/2-sample delay in the time domain and/or a 1/2-sample advance in the frequency domain.

The inverse GDFT (IGDFT) matrix is the scaled Hermitian transpose of the forward GDFT matrix:

*, , 1

1 1

, H N ba

b N a b

a G G

G⁻ = = , (29)

where superscripts (H) and (*) denote the Hermitian transpose and conjugate operations, respectively.

3.1.1.2 Convolution-Multiplication Property of GDFT The DFT has the convolution-multiplication property that the inverse transformation after entry-wise multiplication gives the same result as the circular convolution of the original sequences. Vernet [78] and Martucci [76] derived such properties for other GDFTs. We summarize the results in matrix form as follows.

Let u = x c y and w = x s y, then the following hold:

3.1.1.3 Discrete Trigonometric Transform

The family of DTTs comprises eight versions of the discrete cosine transform (DCT) and eight versions of the discrete sine transform (DST). Martucci formulated the DTTs through the convolution forms as defined in [76, Appendix]. The orthogonal-like relations between the inverse and forward DTTs are

M I

I T

T⁻¹= ¹ , T_II⁻¹= _M¹T_III, T_III⁻¹= _M¹T_II, and T_IV⁻¹ = _M¹ T_IV, (36)

where the DTTs in both sides of each equality must be the same in the categories of cosine or sine and even or odd; and M is 2N and 2N − 1 for the even and odd cases, respectively.

3.1.1.4 Analytic Transform based on GDFT and IGDFT

Marple proposed a DFT-based method for computing the analytic signal corresponding to a real-valued finite sequence of an even length [79]. We extend the result to the GDFTs as described in the following.

Via each GDFT, we can define the generic form for the analytic transform matrix:

T q frequencies. Especially, let x denote the real-valued column vector of length M and a= A_q⁺x, then the analytic vector a has two important properties. First, the real part of a exactly equals the original vector:

n x

a =)

Re( , for n = 0, 1, …, M − 1. (39)

Second, the real and imaginary parts of a are orthogonal:

For example, let x = [1, −2, −3, 7, 11]^T, then

Entry Selection Entry ScalingEntry Scaling Zero PaddingZero Padding IGDFT IGDFT GDFTGDFT

Real vector

|||

Analytic vector

Analytic Transform Analytic Transform

Real vector Analytic vector

Entry Selection

Entry Selection Entry ScalingEntry Scaling Zero PaddingZero Padding IGDFT IGDFT GDFTGDFT

Real vector

|||

Analytic vector

Analytic Transform Analytic Transform

Real vector Analytic vector

Figure 3.4. Reconstruction of analytic transform based on GDFT.

Table 3.1. Definitions of Related Matrices for Analytic Transforms

diag of order N

3.1.1.5 DTT and Analytic Transform

The DTT spectra can be interpreted as the GDFT spectra of analytic vectors in the following way. Given a temporal column vectorx and the DTT vector y = Tq x. Then the IGDFT of the zero-padded scaled DTT equals the analytic transform of the symmetrized temporal vector, that is

) Appendix B. The relation illustrated in (41) is depicted pictorially in Figure 3.5. We take the even DCT-IV for instance. For a real-valued column vector x of length N, the specific expression of (41) is given by

⋅

Scaling Zero PaddingZero Padding

Temporal vector Analytic vector

DTT vector Scaled and zero-padded

DTT vector

Scaling Zero PaddingZero Padding

Temporal vector Analytic vector

DTT vector Scaled and zero-padded

DTT vector IGDFT DTT

Figure 3.5. A pictorial representation of (41).

3.1.1.6 Autocorrelation and Temporal Envelope

The circular and skew-circular autocorrelations of a vector x of length N are defined as

⋅

Just like the time-frequency duality between circular autocorrelations and DFT power spectra, we can have dualities between the GDFT-domain circular or skew-circular autocorrelations and the temporal (IGDFT-domain) envelopes as follows.

Consider a column vector y of length N.

(i) The relation between its skew-circular autocorrelation and IOTDFT/IO²DFT power spectra is given by

]

(ii)The relation between its circular autocorrelation and IDFT/IOFDFT power spectra is given by

By substituting (41) to (45) and (46), we immediately obtain two dualities between the

在文檔中知覺式音訊編碼壓縮瑕疵之探討 (頁 19-0)