Goals and Organization of the Thesis - 以物理模型為方法之琵琶聲音合成

1. Introduction

1.3 Goals and Organization of the Thesis

The playing techniques of Chinese or Asian instruments are versatile. It is very important to

investigate their characteristics thoroughly if we want to construct a complete model of the instruments. A few but important playing techniques were included in the guqin tone synthesis [11]. But there is no study of the technique topic in [12] and [21]. The pipa is indeed a good object for research because it owns specific structure and many different playing techniques at the same time. From the computed and physical analysis point of view, the acoustical characteristics of pipa timbres can be perceived. Taking advantage of DWG, a realistic and flexible pipa model can be constructed as well. To save memory usage, the model parameters and input excitations should be minimized. Moreover, a high efficiency model should be designed for the real time synthesis purpose. Finally, a pipa synthesizer can be created for easier playing and more other applications.

The remainder of this thesis is organized as follows. In Chapter 2, an overview of digital waveguide and digital filter design related to the plucked string sound synthesis is presented.

The spectrum-based but not model-based method adopted only to model the body response is explained as well. The structure, tuning, and playing techniques of the pipa are discussed in Chapter 3. Chapter 4 presents the results of signal analysis of recorded pipa tones to illustrate the characteristics of the timbre of the instrument. In Chapter 5, the waveguide synthesis algorithm including fixed and varied delay line is introduced. Besides the string model, generating the body model and the input excitation signal is also included. Chapter 6 shows the synthetic results compared with recorded ones and the statistics of the listening tests. The discussions are also addressed. The last Chapter contains the conclusions and future works.

Chapter 2 Model-Based Plucked String Sound Synthesis Overview

2.1 Digital Waveguide Modeling

The solution of the one dimension lossless wave equation can be expressed as a combination of a right-going and a left-going traveling wave [2]. After sampled the waves, the vibration of a string can be modeled simply using a digital waveguide with two delay lines.

A digital simulation diagram for a rigidly terminated ideal string is shown in Fig. 2.1. The value of N is defined by 2L/X where the L is the string length. The spatial sampling interval X

= cT with traveling speed c and time sampling interval T. The waves reflecting at either end of the string due to rigidly terminations can be modeled by negating each sample after it reaches the end of a delay line, before feeding it into the next delay line. By summing the values of two delay lines at a certain location along the delay lines, a total output displacement can be obtained.

Fig. 2.2 shows an initial excitation of an ideal pluck string with doubly rigid terminations.

The delay elements are initialized with a shape corresponding to the initial displacement of the string. Using a smooth triangular shape but not a sharp corner is for avoiding the aliasing when sampled. The length of the delay lines controls the frequency of oscillation, and consequently the pitch of the output signal. This changes the wavelength of the travelling waves, which in turn changes the pitch of the sound. Therefore, if the desired frequency of the output is f and the sampling frequency is fs, then the N mentioned before equals the value of f_s/f.

Figure 2.1: The rigidly terminated ideal string with a displacement output indicated at position x=ξ [1].

Figure 2.2: Digital waveguide with initial conditions of delay lines set to triangular waves.

The sum of the upper and lower delay lines gives the actual initial string displacement [1].

2.2 Physical Modeling Implementation Using Digital Filtering Techniques

To simplify the implementation of the waveguide, the two delay lines can be combined into one as shown in Fig. 2.3. The negative multipliers cancel each other out, and the two delay lines can be combined leaving only a length N delay line. In the real world, due to friction and air resistance, the amplitude of the string vibrations decay over time, so it is important to model this effect in the digital waveguide. To attenuate the output, a simple damping factor g (|g| ≦ 1) is added with each delay in Fig. 2.1. So that the values are damped before being fed

into the other delay line. The total N damping factors can also be lumped together into one damping factor g^N to construct a simple loop with N samples delay.

Figure 2.3: Simplified digital waveguide after combining delay lines and damping factors [1].

Besides, the damping of real vibrating strings is typically with a characteristic that increases with frequency for a variety of physical reasons. Therefore, for further realism the lumped damping factor is replaced by a filter that damps each frequency differently. This loop filter always has a low pass characteristic. Fig. 2.4 is the simplest frequency dependent loss filter proposed in the Karplus-Strong algorithm [10]. This loop filter is a single zero FIR filter that averages the Nth and N-1th sample. The difference equation between the input and the beginning of the table to the end and back to the beginning. It finally creates a periodic sound.

The way is very simple but unfortunately the sound doesn't present any variation over time.

Most synthesis techniques remedy to this situation by modifying the sound after the sound synthesis [23]. The Karplus-Strong algorithm differs from those techniques because it directly modifies the wavetable at each of its iteration, so that it can therefore be seen as a delay-line.

Figure 2.4: Simplified digital waveguide after combining delay lines and damping factors [1].

In the Karplus-Strong algorithm, the pluck, which in real string can be considered to contain energy at any frequency, is simulate by filling the delay-line with random noise at each note beginning. At the end, the output will almost be a periodic waveform corresponding to the fundamental frequency of the string. The non-harmonic elements are depressed a lot at the end with introducing the average of two last outputs in the loop again and again which also decays higher frequencies more than lower frequencies. Like in the case of a real string, the delay-line contents finally decay to a small value to silence the sound.

One problem with implementing this system is that the size of the delay lines is an integer.

If we wish to always use a set sampling frequency, then the delay line lengths will not always be integers. Besides, the characteristic of string stiffness induces the higher partials to disperse which can affect both music timbre and pitch perception, for example, a moderate amount of inharmonicity provide a sense of warmth. The prior research [22] has proposed a systematic method for measuring the threshold at which inharmonicity affects perceived pitch. To solve these issues, Fig. 2.5 shows a block diagram of the extended Karplus-Strong algorithm and Table 2.1 is the description of the model respectively [3]. The fractional delay filter and the stiffness dispersion filter in the model are discussed in the next sections. In this extended model, pluck direction and position are included.

Another disadvantage in Karplus-Strong model which is no dynamics variation of the output wave due to fixed initial noise amplitude is considered as well. In a real string, the pluck force not only varies the amplitude but also the energy of the higher frequency content.

A hard pluck usually creates a sound with more energy in the higher frequency range than a soft pluck which is a non-linearity phenomenon [23]. If varying the input noise amplitude only, it creates an effect closer to a variation of the source location than that of a variation in the plucking force. Therefore, a variable bandwidth low pass filter is used to achieve the dynamic level adjustment. However, after adjusting the source, the output levels become different at all pitch frequencies. A level control filter is put at the end of the string model to balance the dynamics of different pitch tones.

Figure 2.5: Simplified digital waveguide after combining delay lines and damping factors.

Table 2.1

Filter description of the extended Karplus-Strong model

Filter Description

Hp(z) Pick-direction low pass filter H^β(z) Pick-position comb filter

H^d(z) String-damping filter (one/two poles/zeros typical) H^f(z) String tuning (fractional delay) filter

H^s(z) String-stiffness allpass filter (several poles and zeros) H^L(z) Dynamic-level lowpass filter

2.3 Fractional Delay Filter Implementation

A non-integral number of samples long loop delays is very important for modeling a string system. Fortunately, some filtering techniques can be adopted to implement the fractional delay function in time-sampled environment [24]. The linear interpolation is the easiest and inexpensive way of the finite impulse response (FIR) forms which effectively draws a straight line between two neighboring samples and returns the appropriate point along that line. It straightforward deals with output signals only and the results are very good when the signal bandwidth is small compared with half the sampling rate. The difference equation is depicted as interpolator for eleven different fractional delay values (D = 0, 0.1, 0.2, ..., 1.0) are shown in Fig 2.6. The phase delay gives the time delay in sample intervals experienced by each sinusoidal component of the input signal [25]. For all fractional delays, the accuracy is higher at low frequencies and zero at DC. Note that there are only six different curves in the upper figure (not eleven), because the magnitude responses for fractional delays d and 1-d are the same [15]. When D = 0.5, it can be found that the gain degrade seriously at the high frequency which is the same effect of Karplus-Strong model. Therefore, it would cause extra output loss if used in the string model with a loop damping filter.

A first-order infinite impulse response (IIR) allpass interpolation is sometimes a better choice since it costs almost the same as linear interpolation in the first-order case and has no gain distortion. But it operates with both input and output signals depicted as the (2.3) so that needs more response time than the FIR form.

)

The value a is around the value (1-D)/(1+D). The phase delay of the allpass filter for a variety of desired delays at DC is shown in Fig. 2.4. Since the amplitude response of any allpass is 1 at all frequencies, there is no need to plot it.

Figure 2.6: Frequency and phase responses of the linear interpolator [2].

Figure 2.7: Phase response of a first-order allpass filter [2].

Comparing the two phase responses of the linear interpolation and first-order allpass filter, the characteristics at low frequency are similar which can achieve the fractional delay accurately. However, when the frequency increases, the phase delays diverge from different directions.

Since the linear interpolator suffers from high frequency response degradation, higher order FIR filter can also be used to improve some. A useful FIR filter approximation for the fractional delay is obtained by setting the error function to zero at zero frequency. This is the maximally flat gain design at DC. It is found that the coefficients of this filter as (2.4) correspond to the weighting coefficients in the classical Lagrange interpolation.

,...,

where Nf is the order of the filter. There is a condition to make sure the magnitude response of the Lagrange interpolator is less than or equal to one for the mentioned values of D when the delay has been chosen so that (Nf - 1)/2 ≤ D ≤ (Nf + 1)/2 when Nf is odd and (Nf /2) - 1 ≤ D

≤ (Nf /2) + 1when Nf is even [15]. This property is advantageous because in digital waveguide models the interpolator is normally used inside a feedback loop and then it is extremely important to preserve the loop gain less than unity. Otherwise the system may become unstable. Since a Lagrange interpolator is a passive filter, the interpolation error only decreases the loop gain but never increases it.

Regarding the higher order allpass filters, can increase the accuracy of phase delay at higher frequency. The transfer function follows the Thiran method as (2.5) - (2.7).The filter has a characteristic with maximally flat group delay (phase slop) at DC [2].

∏

= characteristic. The relationship of the kth partial frequency fk and the fundamental frequency f0

becomes to:

0 1 Bk

f_k = + (2.8)

Since the function is for delay adjustment only, the Thiran allpass filter method described in last section is a good way to achieve the requirement and get no more gain loss at the same time. This filter generates the extra phase delay (i.e. frequency shifting) added on each partial of the tone. Using cascaded filters can achieve the accuracy and the stability at the same time if more than one order is needed. Moreover, this filter is different from the fractional delay one because the required delay D1 formulated in (2.9) comes from the calculation of the pitch frequency with the inharmonicity coefficient B which is depicted in section 4.2 [7].

)) logarithmic representation of the desired fundamental frequency as (2.10), so that the ln(D1) as a function of I_key is approximately a straight line in the lower frequency range. The other

values Cd and kd are two predefined constants [26]. Fig. 2.8 shows a phase delay characteristic of a cascaded four first order dispersion filter which fits the A2 (110 Hz) tone. To fit the dispersion curve, the extra delay at DC generated by this filter can be very large according to the D1 formula.

5 . 27 log 2

) (

12 0

0 ¹²2

f f

I_key = (2.10)

Figure 2.8: Phase delay of an allpass dispersion filter for a 110Hz pipa tone.

2.5 Damping Filter Design

The damping filter (Hd(z) in Fig. 2.5) is designed for implementing the frequency dependent decay due to losses in the string. The filter coefficients are determined with the method illustrated in [4]. The algorithm consists of fitting a straight line to the temporal envelopes of several mainly lower harmonics then using the slopes of the lines as the attenuation factors for those harmonics. However, the attenuation factors for remaining

partials cannot be set to zero but held with gradual degraded values for sound natural characteristic [27]. The damping filter is then generated to fit the magnitude spectrum as the example in Fig. 2.9. The number of order should not be high for perfect fitting because of the stability and computation speed considerations. After the filter generated, it should pass a calibration process to make sure not only that the resulting magnitude response does not exceed unity but also that the match is best for the lowest harmonics whose attenuation rate can be heard easily. As long as the filter design is proper, the synthesized sound would be satisfied because the string model has included most essential blocks.

Figure 2.9: An example of estimating magnitude spectrum (circles) and magnitude response of a first-order IIR filter [4].

2.6 Excitation Signals and Body Model

Although the pluck moment of each plucked instrument is like an impulse with very wide frequency band, there are some differences among them. Using a simple random noise cannot represent these characteristics. Besides the string materials, the different playing techniques also affect the sound of the initial pluck. Even plucking the same string with the same technique, the timbre could vary a little dependent on the string open or not. Therefore, the excitation signals extracted from the recorded tones of different strings and techniques replace

the noise after constructing the string model [4]. The extraction process would be introduced in detail in section 5.2.

Besides the string and the input excitation, another important part of the plucked string instrument is the sound resonator. Using a physical-based 2-D or even 3-D mesh digital waveguide can model an instrument body [28]. But the computing complexity will become another issue to be considered. A spectrum-based body model [11] instead of the physical model can not only simplify the computing but also maintain the flexibility requirement. It is to design a set of filters to fit the spectral shape of the body impulse response. Using cascaded biquad filters [25] can get more accurate fitting and better stability at the same time.

Chapter 3 Description of the Pipa Instrument

3.1 Construction and Tuning

The instrument pipa and the typical playing position are shown in Fig. 3.1. Unlike the guitar, the pipa is played by holding its body vertically. This difference is for the convenience of the left hand performing techniques. Fig. 3.2 shows the front and back views of a pipa. A standard pipa height is 102cm and the effective string length is 72.5cm. With a wood body like a halved pear (round in back, flat in front), the front plate of the pipa is a combination of two or three different thickness and hardness wood plates for the consideration of the string decay time, especially for the A3 string which is usually for solo melody. An older paper [29]

shows the wood resonant peaks of pipa body are concentrated at the range of 450 Hz to 650 Hz and are part of higher pipa tone register. This may be an indication of the sound the ancient Chinese favored. A neck is made as six deep and triangular frets called ledges, and a tuning peg head extends from the neck. The tuning pegs are quite large to match the body.

Besides the neck frets, there are 24 strips of bamboo on the soundboard of the pipa that also function as frets. Each of the 30 total frets is spaced according to well-tempered tuning. Today, the open tuning of the strings is typically A2-D3-E3-A3 (110.5 Hz-147.33 Hz-165.75 Hz-221 Hz), with the highest A being below middle C [30]. The traditional pipa with silk strings and pentatonic tuning was developed into the modern pipa with steel core strings and chromatic tuning during the first half of the last century. Thus using the real fingernail becomes almost impossible. Instead, a fake nail made of turtle shell or special plastics is usually attached to each finger of the right hand for plucking harder strings [31].

Figure 3.1: Photo of the soloist Luo playing the Chinese instrument pipa.

Figure 3.2: Photo of the pipa front (a) and back (b) views.

3.2 Playing Techniques

The name of pipa is from the most basic playing technique which means forward (pi) and backwards (pa) plucking the strings with the outward fingernails. In general, the playing technique consists of the right hand fingers plucking the strings and the left hand fingers touching the strings in a variety of ways to create melodies. There are over 60 different techniques that have been developed through the centuries [31]. Table 3.1 is the summary of the techniques. Basically, they need spectacular finger dexterity to handle such complex performances. The typical pluck is with left hand touching beside the frets. Besides, the strings can be pushed or pulled like the string bending technique in the electric guitar, and twisted or pressed because of the pretty high frets. The wheel (also called finger ring), with the right hand rotating all fingers one by one on the strings, is a unique technique which is able to make an unlimited long note like the tremolo effect. Playing this tone has to focus on the dynamics balance of each finger. This technique of the pipa may be harder than the similar one of the guitar because the tones need to be played with high speed wheeling of five fingers for different length and strength. Others like rolls (similar to the wheel tone but with typical plucking method), slaps, harmonics and noises are often used as well.

There used to be a large repertoire of pipa music to describe exciting scenes like battles and lyrical themes inspired by poetry, landscapes and historical stories. Bai Juyi (772-846 AD), one of the great poets in Tang dynasty wrote the most famous poem for playing the pipa named Pipa Song. It describes the shower of pipa notes by [31]: "... The thicker strings rattled

在文檔中以物理模型為方法之琵琶聲音合成 (頁 10-0)