Physical Model of Vocal Tract - 基於模擬聲門來源波型之語者辨識系統與確認技術

A model of the vocal-tract system under consideration is shown in Fig. 2-3, which oral, nasal cavities, and pharyngeal is included. The innermost end of the pharyngeal tube is a pressure source through the narrow constriction representing the glottal orifice. The tracheal tube is omitted in this simulation, since the effects on the acoustic effect upon the speech spectrum seems to be not so important, except for unvoiced sounds where the glottal opening is fairly large enough [24],[25]. Universally, it’s almost accepted that the acoustic waves inside the vocal tract can be regarded as plane or one dimensional for frequencies below 4 kHz. So, only the cross-sectional area and the perimeter along the length of the vocal tract determine the acoustic characteristics. Furthermore, if the cross-sectional shapes can be assumed to be uniform, for example, as circular, the area function ^{A ,}

( )

^x ^t , as shown in Fig. 2-4, determines completely the acoustic properties of the vocal tract. The area function ^A^'

( )

^x^,^t specifies that of the nasal tract. The entities related to the nasal tract are marked by the prime ‘/’. And in this thesis, all the assumptions introduced in Chapter 3 are based on the uniform vocal tract as circular.

The pressure p ,

( )

x t and the volume velocity ^ug

( )

^x, inside an acoustic tube with ^t non-rigid walls are governed, in the first order approximation, by the following partial differential equations; the equation of Motion (EQM), the continuity (EQC), and that of wall vibration (EQW), shown as below

Respectively, in eqs., (2.1) and (2.2), ρ₀ and c indicate that density of the air at equilibrium, and the sound velocity. The area function is denoted by A₀

( )

x,t , which is related to the previously defined area function ^A

( )

^x^,^t ^by

( ) ^x ^, ^t ^A

₀

( ) ^x ^, ^t ^y ( ) ^x ^, ^t ^S

₀

⁽ ^x ^, ^t ⁾

A = +

_(2.4)

where S₀

( )

x,t indicates a given perimeter of the vocal tract, and y

( )

x,t the amplitude of the yielding of walls due to the sound pressure inside the tube.

VELUM

PRESSURE SOURCE P sub

GLOTTIS

PHARYNGEAL CAVITY ORAL CAVITY

LIPS NOSTRILS NASAL CAVITY

Fig. 2-3 : A schematized vocal-tract model.

The equation of walls, eq. (2.3) has been derived assuming that walls are locally reacting, i.e., the motion, normal to the surface, of one portion of the walls is dependent only upon the acoustic pressure on that portion and independent of the motion of any other part of the walls. The coefficients m, b, and k in eq. (2.3) respectively represent the mass, mechanical resistance, and the stiffness of the wall per unit length of the tube.

These coefficients, for simplification, are assumed to be constant and uniform along the vocal tract, even though that the actual values vary according to the location and also the tenseness of the muscles beneath the wall surface [26],[27]. Often in the literature, these constants have been specified in terms of a unit surface area. In such cases, the total mass of the walls may vary unrealistically depending on the vocal tract configuration. On the contrary, in the specification per unit length, the total mass should be kept relatively constant, since the length variation of the tract is relatively small, especially in comparison with the surface area variation. Considering the fact that the total mass of the vocal tract system is constant, we feel that the specification of mass per unit length seems to be more reasonable than its counterpart. We have discussed only the mass coefficient, since the mass is the dominant component of the non-rigid walls in terms of its acoustic consequences, The values of these constants are estimated from the data reported by [26], assuming the cross-sectional area of 4cm . ²

This is a rather crude representation of the non-rigidity of vocal-tract walls.

Nevertheless, this approximation should be able to account for the dispersive propagation of acoustic waves and for an increase in the bandwidth of the formants due to the loss of energy through the mechanical resistance of low frequencies.

The flow resistances only become relevant at an extremely narrow constriction.

Such a constriction is formed at the glottis during the production of voiced or aspired sounds, and at the place of articulation along the vocal tract during the production of certain consonants. And the resistance at the glottal orifice has been investigated by [28].

They formulated the total resistance as:

cross-sectional area, the length, and the thickness, respectively, of a rectangular duct representing the glottal orifice, and k is a coefficient having a typical value of 1.38, it’s _c determined to account for a normal condition of the larynx being about 3 mm thick. In eq. (2.5), the first term represents a laminar resistance due to the viscosity of the air, and the second term represents a kinetic loss, which depends on the volume velocity u . _g Because of an abrupt contraction and expansion in the passage of airflow at the glottis, eddies are formed at its inlet and outlet. In fact, the value of the coefficient k in eq. (2.5) _c varies from 0.05 to 0.5 at the inlet, and from 0.2 to 1.0 at the outlet, depending on the shapes.

In the case of a constriction along the vocal tract, the shape of the constriction may be so different from the larynx that eq. (2.5) is no longer valid. In addition, the shape of the constriction would vary significantly, depending on the manner of articulation. In this implementation, we abandoned this constriction. Instead, a formula for a laminar resistance in a circular duct is used. The resistance per unit length is given by

/ 2

8 A

R = πµ (2.6)

where A indicates the cross-sectional area of the circular duct.

In Fig. 2-3, the glottal end of the pharyngeal tube is directly connected to the pressure source. The boundary condition is represented by

( ) ( ) t p x t

P

_sub

=

₀

,

(2.7)

where P indicates a give sub-glottal air pressure, and _sub x is the coordinate value of ₀ that end.

Fig. 2-4 : The ideal function of the time-varying vocal tract.

The location of the nasal coupling point is defined as x=x_k in Fig. 2-4. The boundary condition of volume velocity and the pressure must satisfy the following equations

( ) ( ) ^x ^t ^u ^x ^t ^u ( ) ^t

Where the superscript ‘-’ indicates the pharyngeal end, and ‘+’ means the inlet of the oral cavity.

The outlet of the oral and nasal tract is connected to a space where sound is radiated.

Since we don’t concern with the propagation of sound in the radiation filed in this thesis, it should suffice to characterize the space as an acoustic load specifying the velocity-pressure relationship at the mouth opening and at the nostrils. Morse and Ingard [29] have formulated the radiation load as an impedance composed of a resistance and an inductance in series. The resistance is proportional to ω² , which is difficult to

implement into the time-domain simulation. Fortunately, Flanagan [30] has suggested the parallel circuit approximation, where both the conductance, G_rad, and the susceptance, Srad, are independent of frequencies. Thus, at the lip opening and at the nostrils, we obtain the following boundary condition

( ^x

^t ) ⁼ ∫

^S

^rad

( ) ( ^t ^p ^x

^t ) ^dt ⁺ ^G

^rad

( ) ( ^t ^p ^x

^t )

u

, ,

,

_(2.10)

where ^Srad

( )

^t =9π^A0

(

^xM,^t

)

/128ρ0^c

G_rad

( )

t =3π πA0

(

x_M,t

)

/8ρ0

Considering that we don’t know exactly how to describe the real lip opening shapes, nor their acoustic effects, it may not be justifiable to further elaborate the specification of the radiation load on the basis of the circular vibrating piston equivalent.

在文檔中基於模擬聲門來源波型之語者辨識系統與確認技術 (頁 23-28)

Physical Model of Vocal Tract

( )

( )

( )

( )

( )

( )

( ) x , t A

( ) x , t y ( ) x , t S

( x , t )

A = +

( )

( )

( ) ( ) t p x t

P

=

,

( ) ( ) x t u x t u ( ) t

( x

t ) = ∫

S

( ) ( t p x

t ) dt + G

( ) ( t p x

t )

u

, ,

,

( )

(

)

( )

(

)

( ) ^x ^, ^t ^A

( ) ^x ^, ^t ^y ( ) ^x ^, ^t ^S

⁽ ^x ^, ^t ⁾

( ) ( ) ^x ^t ^u ^x ^t ^u ( ) ^t

( ^x

^t ) ⁼ ∫

^S

( ) ( ^t ^p ^x

^t ) ^dt ⁺ ^G

( ) ( ^t ^p ^x

^t )