Acoustic Measurement

Chapter 2 Framework of the Acoustic-Phonetics and SONFIN Based

2.4 Acoustic Characteristic Checking

2.4.2 Acoustic Measurement

In this section, we explain the signal processing techniques used in computing the two sets of acoustic characteristics. These two acoustic sets are fundamental frequency and formants.

Fundamental Frequency

The general problem of fundamental frequency estimation is to take a portion of signal and to find the dominant frequency of repetition. Difficulties arise from (1) that not all signals are periodic, (2) those that are periodic may be changing in fundamental frequency over the time of interest, (3) signals may be contaminated with noise, even with periodic signals of other fundamental frequencies, (4) signals that are periodic with interval T are also periodic with interval 2T, 3T etc, so we need to find the smallest periodic interval or the highest fundamental frequency; and (5) even signals of constant fundamental frequency may be changing in other ways over the interval of interest. A PDA based Subharmonic-to-Harmonic Ratio (SHR) which can reduce these problems is adopted to evaluate the fundamental frequency [30].

The SHR-based PDA algorithm is computed in the frequency domain.

First, Let A(f) denote the short-term spectrum function, which is obtained by applying the Fourier transform on windowed short-term speech frames.

The length of FFT is varied with the sampling rate and frame length.

Suppose that the fundamental frequency is f0, and then the sum of harmonic amplitude is defined as:

∑

where N is the maximum number of harmonics considered. If we only consider the sub-harmonic frequency that is at one half of fundamental frequency, the sum of sub-harmonic amplitude is defined as:

∑

Consequently, SHR can be obtained by dividing SS with SH:

SH = SS _{(2. 24)}

In order to get SS and SH, we could use the direct spectrum compression technique on linear frequency scale as that in Harmonic Product Spectrum (HPS) algorithm. However, because of the numerical problem, a logarithmic transformation on the frequency scale is more preferable,

developing the current algorithm, we adopted this basic approach.

Nevertheless, the rationale and detail implementation are quite different, which affects the performance in a significant way. To facilitate our work in log domain, we reformulate the above definitions. Let LOGA(f) denote the short-term log spectrum, and log(f0) denote fundamental frequency on the log scale. Therefore, we have:

∑

The log frequency scale is then linearly interpolated. In order to obtain SH, the spectrum is shifted leftward along the logarithmic frequency abscissa at even orders, i.e., log(2), log(4),… log(2N). These shifted spectra are added together.

∑

−

Similarly, by shifting the spectrum leftward at log(1), log(3),

log(5), …. log(2N-1), we get SS also at log(0.5f0)

Next, we obtain the difference function, which is defined as:

DA(log f ) = SUMA(log f )even – SUMA(log f )odd (2. 31)

In so doing, we remove the effect of the contribution of the points around the real peaks, which is equivalent to peak enhancement. Moreover, there are some very interesting properties of the DA(•) function. In ideal cases, if sub-harmonics do not exist, and ignoring the contribution from the points that are at log(nf ) ± log(0.25f0), we would have two maximum values at log(0.5f0) and log(0.25f0) from (2.31), respectively. The values are:

DA(log(0.5f0)) = SH – SS (2. 32)

DA(log(0.25f0)) = SH + SS (2. 33)

Therefore, SHR can be approximated by the following simple formula:

))

Based on the above analysis, we perform the following procedures to compute SHR and then determine the pitch: First, we locate the position of the global maximum denoted as log f1. Then, starting from this point, the position of the next local maximum denoted as log f2 is selected in the range of [log(1.75f1), log(2.25f1)]. Following (2. 34), SHR can be

If SHR is less than a certain threshold value, which is 0.6 in the current implementation, f2 is chosen as the final pitch. Otherwise, f1 will be selected.

Formant Frequency

The algorithm adopted in the system to evaluate the formant candidates is based on so-called linear prediction analysis (LP). As shown in Fig. 11, each frame of speech to be analyzed is first preprocessed by pre-emphasis and Hamming windowing. The preprocessed speech is used to design the inverse filter A(z). Then, the LPC-based spectrum is evaluated from 1 / A(z). We chose the peaks in the spectrum as the formant candidates.

Fig. 11 procedure for formant estimation using linear prediction

The linear prediction analysis is based on modelling the speech signal as if it are generated by a particular kind of source and filter, as shown below.

Fig. 12 simplified model for speech production Impulse Train

Generator

Random Noise Generator

Time-Varying Digital Filter

Lip Radiation Model

Fundamental Frequency

u(n) s(n)

G Speech Pre-

Processing

A(z)

Computation

LPC-based Spectrum

Peak- Picking

Formant Candidates

In this model, the composite spectrum effects of radiation, vocal tract and glottal excitation are represented by a time-varying digital filter whose steady state system function is of the form

)

The system can be excited by an impulse train for voiced speech or a random sequence of unvoiced speech. The pitch period and voiced/unvoiced parameters can be estimated using linear predictive analysis. The speech samples s(n) can be given by using simple difference equation

The linear predictor with predictor coefficient αk, and order p is defined as a system whose output is

∑

The system function of this linear predictor is

∑ ^•

⁻

=

a

z

p ( )

_{(2. 39)}

The prediction error is defined as

Thus the prediction error filter is the output of the system whose transfer function is G*u(n) and in such condition, prediction error filter A(z) will be an inverse filter for the system H(z)

) ) (

( A z

z G

H = _{(2. 42)}

The basic problem of linear prediction here is to determine a set of predictor coefficients {αk} directly from speech signal in such a manner as to obtain a good estimate of spectral properties of speech signal through the use of (2. 42). The predictor coefficients could be computed efficiently using the Levinson-Durbin recursion [1].

在文檔中基於聽覺語言學與模糊類神經網路之英文母音辨識技術 (頁 38-46)

Chapter 2 Framework of the Acoustic-Phonetics and SONFIN Based

2.4 Acoustic Characteristic Checking

2.4.2 Acoustic Measurement

∑

∑

∑

∑

))

)

∑

∑ •

=

a

z

z

p ( )

∑ ^•