EMPIRICAL MODE DECOMPOSITION 1. One-dimensional EMD

SPATIAL AUDIO ANALYSIS BASED ON PERCEPTUALLY EMPIRICAL MODE DECOMPOSITION

2. EMPIRICAL MODE DECOMPOSITION 1. One-dimensional EMD

The resulting spectrum in different time index is then convolved with [7]:

Ψ(Ω) =

ORTF microphones near conductor (Reference microphone signal).

ORTF microphones near tympani (Desired microphone signal).

Figure 2: Microphone setting for multichannel audio analysis.

After convolving, the resolution of spectral feature is lower compared to the original one due to the relatively broad critical-band masking curves Ψ(Ω). To avoid any spike at the beginning resulting from the DC offset in each band, we transform spectral feature by a compressing static non-linear transformation [2]. In this application, the filtering is performed in the logarithm domain, the output is pro-cessed with an IIR filter along the time axis:

H(z) =0.2z⁴+ 0.1z³− 0.1z − 0.2

1− 0.95z⁻¹ (4)

The exponential of the filter result is then taken to expand the signal back to the original domain. As in conven-tional perceptual linear prediction, critical band weighted by the equal-loudness-curve multiplication followed by a cube root compress are applied to simulate the power law of hearing [2]. With these procedures, as illustrated in Fig. 3, we are able to generate the spectral features ex-tracted from different locations of a concert hall. RASTA-PLP makes audio analysis less sensitive to the slowly varying or steady-state factors in audio signal, and re-sults in a progressive change of spectral feature along the frequency axis. This implies the colors of the spectral images steadily vary from red to blue with green in be-tween. Most RASTA-PLP-based spectral features have this property which spectral features generated by other approaches do not possess. These highly distinguishable spectral features not only highlight the important infor-mation buried in signals, but facilitate the feature analysis in the following process.

2. EMPIRICAL MODE DECOMPOSITION 2.1. One-dimensional EMD

Empirical mode decomposition (EMD) is one of the ef-fective approaches for processing nonstationary signals.

EMD has been attracted extensive attention and applied successfully in many fields such as tide analysis, earth-quake prediction, speaker identification, distortion

detec-AES 40th International Conference, Tokyo, Japan, 2010 October 8-10 2

Time/10ms Frames

Frequency/Bark

RASTA-PLP spectral features of microphone near conductor

100 200 300 400 500 600 700 800 900

RASTA-PLP spectral features of microphone near tympani

100 200 300 400 500 600 700 800 900

Figure 3: Segmented RASTA-PLP spectral features ex-tracted from different locations of a concert hall.

tion, and cardiac arrhythmias measurement [8]. The com-ponents result from EMD, called intrinsic mode function (IMF), is characterized by two properties:

1. The number of extrema and the number of zero crossings should differ by no more than one.

2. The local average defined by the average of the maximum and minimum envelopes is zero, i.e., both envelopes are locally symmetric around the enve-lope mean.

Based on these rules, signal may be decomposed into a number of IMFs. Considering a real and stable sequence s(t), which may be divided into fine-scale details and a residue. The IMFs of the signal s(t) are found by iterat-ing the followiterat-ing siftiterat-ing processes:

1. Initialize r0(t)← s(t), and i ← 1.

2. Find the i-th IMF.

(a) Initialize gi,j(t)← ri(t), and the number of sifts j← 0.

(b) Find the local maxima and minima of gi,j(t).

(c) Estimate the maximum envelope ui,j(t) of gi,j(t) by passing a cubic spline through the local maxima. Similarly, find the minimum enve-lope li,j(t) with the local minima.

(d) Compute an approximation to the local aver-age: mi,j(t)← 0.5(ui,j(t) + li,j(t)).

(e) Extract the detail gi,j+1(t)← gi,j(t)−mi,j(t), and let j← j + 1.

Spectrogram of microphone near conductor

Time/10ms Frames

Spectrogram of microphone near tympani

Time/10ms Frames

Figure 4: Segmented spectrograms extracted from differ-ent locations of a concert hall.

(f) Check whether the stopping criterion defined as the standard deviation from two consec-utive results in the sifting process is smaller than a given value ϵ [8]. where T denotes the length of original signal.

If SD(t)≥ ϵ, return to step b). If not, let the i-th IMF xi(t)← gi,j+1(t).

3. Update r_i+1(t)← ri(t)− xi(t).

4. Repeat step 2) with i ← i + 1 until the residue r_i+1(t) has at most one extremum or a constant re-mained.

The goal of sifting is to repeatedly subtract the large-scale features of the signal from the fine-scale ones. Finally, the signal s(t) is represented as a sum of IMFs and the residue:

s(t) =

∑n i=1

xi(t) + rn(t) (6) The sifting processes remove the low frequency informa-tion in each step of EMD until the highest frequency com-ponent remains. Theoretically, adding all the IMFs to-gether with the residue can reconstruct the signal without signal distortion. Cubic spline interpolation is commonly used to approximate the upper and lower envelopes in the EMD; however, it usually fails to model boundaries, es-pecially those with abrupt changes. Although one of the approaches to circumvent this problem is to use a longer

AES 40th International Conference, Tokyo, Japan, 2010 October 8-10 3

Spectral features near conductor

200 400 600 800

5 10 15 20 25

Spectral features near tympani

200 400 600 800

5 10 15 20 25

200 400 600 800

5 10 15 20 25

200 400 600 800

Figure 5: From top to bottom: The first four 2D-IMFs that correspond to the spectral features.

signal and omit the ends of processed signal, this ap-proach does not work for a short signal [9]. After ob-taining the IMFs, we apply the Hilbert transform on each IMF x_i(t):

where P denotes the Cauchy principal value of the inte-gral. The analytic signal of x_i(t) can then be defined as [10]:

zi(t) = xi(t) + jyi(t) = ai(t)e^jφⁱ^(t) (8) where ai(t) =√

xi(t)²+ yi(t)²and φi(t) = tan⁻¹(_x^yⁱ^(t)

i(t)) are the instantaneous amplitude and the instantaneous phase at time t, respectively. Then the instantaneous frequency may be derived from:

f_i(t) = ω_i(t)

2π = φ˙_i(t)

2π (9)

and the original signal s(t) being analyzed can be repre-sented as: Notice that the residue rn(t) is excluded owing to its intrinsic monotonousness or constant. Since both instan-taneous amplitude and instaninstan-taneous frequency are func-tions of time, the time-frequency representation of Hilbert

amplitude spectrum may be expressed as:

H(ω, t) = Re Based on the Hilbert-Huang spectrum, the marginal spec-trum h(ω) can be formulated as:

h(ω) =

∫ T 0

H(ω, t)dt (12)

where T denotes the length of original signal. The marginal spectrum h(ω) provides a measurement of total ampli-tude (or energy) at every frequency.

2.2. Two-dimensional EMD

Two-dimensional EMD functions as the one-dimensional EMD except for the extrema detection, which is performed by using morphological reconstruction during the sifting processes. In addition, to compute the surface interpola-tion, a radial basis function (RBF) is defined as follows:

R(p) =

∑N i=1

wi· Φ(∥ p − ci∥) (13) where p ∈ R^d, and∥ · ∥ denotes the Euclidean norm.

The approximating function R(p) is represented as a sum of N real-value functions Φ(·), each of them is associated with a different center ci, and is also weighted by a linear-least-square estimation coefficient wi. Similarly, the 2D sifting processes can be expressed as [11], [12]:

AES 40th International Conference, Tokyo, Japan, 2010 October 8-10 4

Spectral features near conductor

200 400 600 800

5 10 15 20 25

Spectral features near tympani

200 400 600 800

5 10 15 20 25

200 400 600 800

5 10 15 20 25

200 400 600 800

Figure 6: (Followed by Fig. 5) From top to bottom: The fifth to the eighth 2D-IMFs that correspond to the spectral features.

1. Find the maxima and minima of the spectral feature S(t, f ) by the morphological reconstruction.

2. Estimate the maximum envelope ui,j(t, f ) and min-imum envelope li,j(t, f ) by connecting local max-ima and local minmax-ima with RBFs, respectively.

3. Compute an approximation to the local average:

m_i,j(t, f )← 0.5(ui,j(t, f ) + l_i,j(t, f )).

4. Extract the detail S_i,j+1(t, f )← Si,j(t, f )−mi,j(t, f ).

5. Check whether the stopping criterion predetermined by a fixed number of iterations is satisfied. If so, processing for one of the IMFs is finished, other-wise repeat steps 2-4.

As defined in the one-dimensional EMD, the above pro-cesses must be continued until there is no more 2D-IMF.

在文檔中音視訊號融合之情緒偵測系統 (頁 38-41)